Re: Optimizer on sort aggregate

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: Feng Tian <fengttt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer on sort aggregate
Date: 2014-10-17 23:53:32
Message-ID: CAApHDvqfi5QKc8HCNmrE5vr=0dsE6DwuPPYK3pZuiYf2yVPLTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 18, 2014 at 12:35 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> > The query,
> > select count(distinct j) from t group by t, i;
> >
> > runs for 35 seconds. However, if I change the query to,
> > select count(distinct j) from t group by i, t; -- note switching the
> > ordering
> > select count(distinct j) from t group by decode(t, 'escape'), i; --
> convert
> > t to bytea
> >
> > Run times are just about 5 and 6.5 seconds. The reason is clear,
> compare a
> > string with collation is slow, which is well understood by pg hackers.
> > However, here, the sorting order is forced by the planner, not user.
> > Planner can do the following optimizations,
>
> Interesting. I got following result:
>
> test=# explain analyze select count(distinct j) from t group by t, i;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
> GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual
> time=1332.937..2431.238 rows=1000000 loops=1)
> Group Key: t, i
> -> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual
> time=1332.922..1507.413 rows=1000000 loops=1)
> Sort Key: t, i
> Sort Method: external merge Disk: 33232kB
> -> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22)
> (actual time=0.006..131.406 rows=1000000 loops=1)
> Planning time: 0.031 ms
> Execution time: 2484.271 ms
> (8 rows)
>
> Time: 2484.520 ms
>
> test=# explain analyze select count(distinct j) from t group by i, t;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
> GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual
> time=602.510..1632.087 rows=1000000 loops=1)
> Group Key: i, t
> -> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual
> time=602.493..703.274 rows=1000000 loops=1)
> Sort Key: i, t
> Sort Method: external sort Disk: 33240kB
> -> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22)
> (actual time=0.014..129.213 rows=1000000 loops=1)
> Planning time: 0.176 ms
> Execution time: 1685.575 ms
> (8 rows)
>
> Time: 1687.641 ms
>
> Not so big difference here (maybe because I use SSD) but there is
> still about 50% difference in execution time. Note that I disable
> locale support.
>
>
I think this is more likely your locale settings, as if I do:

create table t(i int, j int, k int, t text collate "C");
The GROUP BY t,i runs about 25% faster.

I've not looked at it yet, but Peter G's patch here
https://commitfest.postgresql.org/action/patch_view?id=1462 will quite
likely narrow the performance gap between the 2 queries.

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-18 00:03:18 Re: Materialized views don't show up in information_schema
Previous Message David Rowley 2014-10-17 23:36:30 Re: Optimizer on sort aggregate