Re: Additional select fields in a GROUP BY

From: Vitaly Belman <vitalyb(at)gmail(dot)com>
To: Bruno Wolff III <bruno(at)wolff(dot)to>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Additional select fields in a GROUP BY
Date: 2004-06-13 17:35:36
Message-ID: fa96e3c604061310354113e9f8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Bruno:

It wasn't exactly my case but you did give me an idea by this tip,
changing a perspective did quite good to the timing of this query.

Tom:

Hmm.. I am not sure how I can demonstrate this to you... To see the
time differences you'd need the whole table.. That's quite a lot of
data to be posted on a mailing list, if you wish to test it on your
side, I'll dump this table partly and send them to you somehow.

I do stand by what I said though, here's the real query example:

Original query (execution time, 800ms):

select s.series_id, avg(vote_avg), sum(vote_count)
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit (cost=6523.51..6523.53 rows=10 width=12)
-> Sort (cost=6523.51..6566.27 rows=17104 width=12)
Sort Key: sum(b.vote_count)
-> GroupAggregate (cost=1368.54..5320.92 rows=17104 width=12)
-> Merge Join (cost=1368.54..4796.91 rows=58466 width=12)
Merge Cond: ("outer".series_id = "inner".series_id)
-> Merge Join (cost=0.00..6676.41 rows=65902 width=16)
Merge Cond: ("outer".series_id = "inner".series_id)
-> Index Scan using bv_series_pkey on
bv_series s (cost=0.00..386.83 rows=17104 width=4)
-> Index Scan using i_books_series_id on
bv_books b (cost=0.00..14148.38 rows=171918 width=12)
-> Sort (cost=1368.54..1406.47 rows=15173 width=4)
Sort Key: sg.series_id
-> Index Scan using i_seriesgenres_genre_id
on bv_seriesgenres sg (cost=0.00..314.83 rows=15173 width=4)
Index Cond: (genre_id = 1)

Query with added GROUP BY members (execution time, 1400ms):

select s.series_id, s.series_name, s.series_picture, avg(vote_avg),
sum(vote_count)
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id, s.series_name, s.series_picture
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit (cost=12619.76..12619.79 rows=10 width=47)
-> Sort (cost=12619.76..12662.52 rows=17104 width=47)
Sort Key: sum(b.vote_count)
-> GroupAggregate (cost=10454.67..11417.18 rows=17104 width=47)
-> Sort (cost=10454.67..10600.83 rows=58466 width=47)
Sort Key: s.series_id, s.series_name, s.series_picture
-> Merge Join (cost=1368.54..4796.91 rows=58466 width=47)
Merge Cond: ("outer".series_id = "inner".series_id)
-> Merge Join (cost=0.00..6676.41
rows=65902 width=51)
Merge Cond: ("outer".series_id =
"inner".series_id)
-> Index Scan using bv_series_pkey on
bv_series s (cost=0.00..386.83 rows=17104 width=39)
-> Index Scan using i_books_series_id
on bv_books b (cost=0.00..14148.38 rows=171918 width=12)
-> Sort (cost=1368.54..1406.47 rows=15173 width=4)
Sort Key: sg.series_id
-> Index Scan using
i_seriesgenres_genre_id on bv_seriesgenres sg (cost=0.00..314.83
rows=15173 width=4)
Index Cond: (genre_id = 1)

Notice that the GROUP BY items added the following to the plan:

-> GroupAggregate (cost=10454.67..11417.18 rows=17104 width=47)
-> Sort (cost=10454.67..10600.83 rows=58466 width=47)
Sort Key: s.series_id, s.series_name, s.series_picture

Which eventually almost doubles the execution time.

On Sun, 13 Jun 2004 08:52:12 -0500, Bruno Wolff III <bruno(at)wolff(dot)to> wrote:
>
> On Sun, Jun 13, 2004 at 06:21:17 +0300,
> Vitaly Belman <vitalib(at)012(dot)net(dot)il> wrote:
> >
> > Consider the following query:
> >
> > select t1field1, avg(t2fieild2)
> > from t1, t2
> > where t1.field1 = t2.field2
> > group by t1field1
> >
> > That works fine. But I'd really like to see more fields of t1 in this
> > query, however I can't add them into the select because they're not
> > part of the GROUP BY, thus I have to add them to there too:
>
> If t1.field1 is a candiate key for t1, then the normal thing to do is
> to group t2 by t2.field1 (assuming you really meant to join on t2.field1,
> not t2.field2) and THEN join to t1. That may even be faster than the way you
> are doing things now.
>
> So the query would look like:
>
> SELECT t1.field1, t1.field2, t1.field3, a.t2avg FROM t1,
> (SELECT field1, avg(field2) as t2avg FROM t2 GROUP BY field1) as a
> WHERE t1.field1 = a.field1
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-06-13 17:59:24 Re: Additional select fields in a GROUP BY
Previous Message Bruno Wolff III 2004-06-13 13:52:12 Re: Additional select fields in a GROUP BY