Quick Links

Planner having way wrong estimate for group aggregate

From:	"Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Planner having way wrong estimate for group aggregate
Date:	2004-09-18 17:01:17
Message-ID:	20040918170117.GA3354@uio.no
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi,

I'm using PostgreSQL 7.4 on a table with ~700.000 rows looking like this:

And I'm doing the query (after VACUUM ANALYZE)

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=108062.34..114477.98 rows=172735 width=17) (actual time=20977.544..23890.020 rows=361 loops=1)
-> Sort (cost=108062.34..109912.99 rows=740263 width=17) (actual time=20947.372..21627.107 rows=710720 loops=1)
Sort Key: date_trunc('day'::text, tid)
-> Seq Scan on enkeltsalg (cost=0.00..18010.29 rows=740263 width=17) (actual time=0.091..7180.528 rows=710720 loops=1)
Total runtime: 23908.538 ms
(5 rows)

Now, as you can see, the GroupAggregate here is _way_ off, so the planner
makes the wrong choice (it should do a hash aggregate). If I set sort_mem to
131072 instead of 16384, it does a hash aggregate (which is 10 seconds
instead of 24), but I can't have sort_mem that high generally.

Now, my first notion was creating a functional index to help the planner:

smt=# create index enkeltsalg_dag on enkeltsalg ( date_trunc('day',tid) );
CREATE INDEX
smt=# vacuum analyze;
VACUUM

However, this obviously didn't help the planner (this came as a surprise to
me, but probably won't come as a surprise to the more seasoned users here :-)
):

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=103809.15..110017.11 rows=175512 width=17) (actual time=21061.357..23917.370 rows=361 loops=1)
-> Sort (cost=103809.15..105585.95 rows=710720 width=17) (actual time=21032.239..21695.674 rows=710720 loops=1)
Sort Key: date_trunc('day'::text, tid)
-> Seq Scan on enkeltsalg (cost=0.00..17641.00 rows=710720 width=17) (actual time=0.091..7231.387 rows=710720 loops=1)
Total runtime: 23937.791 ms
(5 rows)

I also tried to increase the statistics on the "tid" column:

smt=# alter table enkeltsalg alter column tid set statistics 500;
ALTER TABLE
smt=# analyze enkeltsalg;
ANALYZE

However, this made the planner only do a _worse_ estimate:

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=107906.59..114449.09 rows=199715 width=17) (actual time=20947.197..23794.389 rows=361 loops=1)
-> Sort (cost=107906.59..109754.56 rows=739190 width=17) (actual time=20918.001..21588.735 rows=710720 loops=1)
Sort Key: date_trunc('day'::text, tid)
-> Seq Scan on enkeltsalg (cost=0.00..17996.88 rows=739190 width=17) (actual time=0.092..7166.488 rows=710720 loops=1)
Total runtime: 23814.624 ms
(5 rows)

Actually, it seems that the higher I set statistics on "tid", the worse the
estimate becomes.

Also, I was told (on #postgresql :-) ) to include the following information:

smt=# select n_distinct from pg_stats where attname='tid';
n_distinct
------------
-0.270181
(1 row)

Any ideas for speeding this up?

/* Steinar */
--
Homepage: http://www.sesse.net/

Responses

Re: Planner having way wrong estimate for group aggregate at 2004-09-18 19:48:13 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Shachar Shemesh	2004-09-18 17:02:48	Re: Tryint to match Solaris-Oracle performance with directio?
Previous Message	Tom Lane	2004-09-18 16:35:50	Re: Tryint to match Solaris-Oracle performance with directio?