Re: DATE_TRUNC() and GROUP BY?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Dave Johansen <davejohansen(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: DATE_TRUNC() and GROUP BY?
Date: 2013-12-21 05:46:22
Message-ID: CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Dec 20, 2013 at 1:35 PM, Dave Johansen <davejohansen(at)gmail(dot)com>wrote:

> I just ran into an interesting issue on Postgres 8.4. I have a database
> with about 3 months of data and when I do following query:
> SELECT DATE_TRUNC('day', time) AS time_t, COUNT(*) FROM mytable GROUP BY
> time_t;
>
> EXPLAIN shows that it's doing a sort and then a GroupAggregate. There will
> only be ~90 outputs, so is there a way I can hint/force the planner to just
> do a HashAggregate?
>
> Just to see if it would change the plan, I tried increasing the work_mem
> up to 1GB and it still did the same plan.
>
>
PostgreSQL does not really have any stats on the selectivity of
date_trunc('day', time) so my guess is that it can only assume that it has
the same selectivity as the time column by itself... Which is very untrue
in this case.
The group aggregate plan is chosen here as PostgreSQL thinks the the hash
table is going to end up pretty big and decides that the group aggregate
will be the cheaper option.

I mocked up your data and on 9.4 I can get the hash aggregate plan to run
if I set the n_distinct value to 90 then analyze the table again.. Even if
you could do this on 8.4 I'd not recommend it as it will probably cause
havoc with other plans around the time column. I did also get the hash
aggregate plan to run if I created a functional index on date_trunc('day',
time) then ran analyze again. I don't have a copy of 8.4 around to see if
the planner will make use of the index in the same way.

What would be really nice is if we could create our own statistics on what
we want, something like:

CREATE STATISTICS name ON table (date_trunc('day', time));

That way postgres could have a better idea of the selectivity in this
situation.

I'd give creating the function index a try, but keep in mind the overhead
that it will cause with inserts, updates and deletes.

Regards

David Rowley

> Thanks,
> Dave
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Алексей Кузнецов 2013-12-23 06:32:46 Strange number of rows in plan cost
Previous Message Gavin Flower 2013-12-20 21:19:21 Re: slow query - will CLUSTER help?