Re: Need help optimizing this query

From: "Pat Maddox" <pergesu(at)gmail(dot)com>
To: "Michael Glaesemann" <grzm(at)seespotcode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Need help optimizing this query
Date: 2007-07-18 22:10:56
Message-ID: 810a540e0707181510y253fbbb3x678d6bea09e47315@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 7/18/07, Pat Maddox <pergesu(at)gmail(dot)com> wrote:
> On 7/18/07, Michael Glaesemann <grzm(at)seespotcode(dot)net> wrote:
> >
> > On Jul 18, 2007, at 16:12 , Pat Maddox wrote:
> >
> > > ERROR: invalid reference to FROM-clause entry for table "video_views"
> > > LINE 20: JOIN assets ON (video_views.video_id=videos.id)
> > > ^
> > > HINT: There is an entry for table "video_views", but it cannot be
> > > referenced from this part of the query.
> >
> > It's because I mismatched the JOIN clauses during my copy-and-paste :(
> >
> > > On 7/18/07, Michael Glaesemann <grzm(at)seespotcode(dot)net> wrote:
> >
> > >> FROM video_views
> > >> JOIN assets ON (video_views.video_id=videos.id)
> > >> JOIN videos ON (video_views.asset_id=assets.id)
> >
> > This should be
> >
> > FROM video_views
> > JOIN assets ON (video_views.asset_id=assets.id)
> > JOIN videos ON (video_views.video_id=videos.id)
> >
> > Do you have the EXPLAIN ANALYE output of the query?
> >
> > Michael Glaesemann
> > grzm seespotcode net
> >
> >
> >
>
> For some reason the functions you wrote are giving me trouble (there's
> a BIGINT involved, I tried changing the functions around but kept
> having issues). So here's the full query, hopefully formatted better:
>
> SELECT
> SUM(CASE WHEN (hit IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0
> END) AS count_hits_console,
> SUM(CASE WHEN (hit IS TRUE AND logged_in IS FALSE) THEN 1 ELSE 0
> END) AS count_hits_remote,
> SUM(CASE WHEN (played IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0
> END) AS count_played_console,
> SUM(CASE WHEN (played IS TRUE AND logged_in IS FALSE) THEN 1 ELSE
> 0 END) AS count_played_remote,
> SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS TRUE) THEN 1
> ELSE 0 END) AS count_downloaded_console,
> SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS FALSE) THEN 1
> ELSE 0 END) AS count_downloaded_remote,
> SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND
> logged_in IS TRUE) THEN assets.size ELSE 0 END) as
> download_size_console,
> SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND
> logged_in IS FALSE) THEN assets.size ELSE 0 END) AS
> download_size_remote,
> videos.id,
> videos.title,
> videos.guid
> FROM video_views
> JOIN assets ON (video_views.asset_id=assets.id)
> JOIN videos on (video_views.video_id=videos.id)
> WHERE videos.company_id=1
> GROUP BY videos.id,
> videos.title,
> videos.guid
> ORDER BY count_hits_remote DESC
> LIMIT 100
>
>
>
> and here's the EXPLAIN ANALYZE output:
>
> Limit (cost=127072.90..127073.12 rows=87 width=64) (actual
> time=2636.560..2636.567 rows=20 loops=1)
> -> Sort (cost=127072.90..127073.12 rows=87 width=64) (actual
> time=2636.558..2636.562 rows=20 loops=1)
> Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND
> (video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END)
> -> HashAggregate (cost=127067.49..127070.10 rows=87
> width=64) (actual time=2636.481..2636.506 rows=20 loops=1)
> -> Hash Join (cost=880.96..125995.46 rows=38983
> width=64) (actual time=24.904..2635.719 rows=122 loops=1)
> Hash Cond: (video_views.asset_id = assets.id)
> -> Hash Join (cost=195.96..124433.01 rows=39009
> width=60) (actual time=8.327..2618.982 rows=122 loops=1)
> Hash Cond: (video_views.video_id = videos.id)
> -> Seq Scan on video_views
> (cost=0.00..101352.70 rows=5998470 width=12) (actual
> time=0.031..1410.231 rows=5998341 loops=1)
> -> Hash (cost=194.87..194.87 rows=87
> width=52) (actual time=1.001..1.001 rows=90 loops=1)
> -> Bitmap Heap Scan on videos
> (cost=4.93..194.87 rows=87 width=52) (actual time=0.111..0.840 rows=90
> loops=1)
> Recheck Cond: (company_id = 1)
> -> Bitmap Index Scan on
> index_videos_on_company_id (cost=0.00..4.90 rows=87 width=0) (actual
> time=0.079..0.079 rows=90 loops=1)
> Index Cond: (company_id = 1)
> -> Hash (cost=487.78..487.78 rows=15778
> width=12) (actual time=16.527..16.527 rows=15778 loops=1)
> -> Seq Scan on assets (cost=0.00..487.78
> rows=15778 width=12) (actual time=0.023..9.601 rows=15778 loops=1)
> Total runtime: 2637.043 ms
> (17 rows)
>
>
> That one runs reasonably fine, because there are only 20 videos being
> returned and a handful of video views associated with them. In the
> real query there are about 1k videos and a couple million views. That
> took about 80 minutes to run, according to logs.
>
> Pat
>

Okay so it doesn't normally take 80 minutes to run. Something funky
just happened and it took that long once. It usually takes between
90-100 seconds. My coworker told me it takes 80 minutes but it
appears that's an anomaly.

One thing we were wondering is if all the aggregate calculations might
be slowing it down, and if it might be faster to do six separate
queries. The real problem there is sorting and merging the data sets.
Merging isn't tough, but making sure that all queries obey the
desired sort order is. What do you think?

Pat

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Glaesemann 2007-07-18 22:11:12 Re: Need help optimizing this query
Previous Message Roderick A. Anderson 2007-07-18 22:03:19 Re: DBI/DBD::Pg and transactions