Re: memory usage of group by select

From: Anthony <osm(at)inbox(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: memory usage of group by select
Date: 2009-12-29 22:25:35
Message-ID: 71cd4dd90912291425q76b22e2ak94c432823c6dbfc7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Dec 29, 2009 at 4:09 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com>wrote:

> It's expecting 85k distinct groups. If that's not accurate, then
> HashAggregate would use more memory than expected.

Great diagnosis. There are actually about 76 million distinct groups.

> See if you can make it work by setting enable_hashagg = off.
>

Will do. Maybe overnight tonight. The explain is estimating that to take
10 times as long, and I can't afford to do that right now.

If that works, good -- the real solution is different. Maybe you need
> to ANALYZE more, or increase the size of the stats bin for this column.
>

On Tue, Dec 29, 2009 at 5:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> If ANALYZE consistently underestimates the number of distinct values,
> you may have to force matters with ALTER TABLE SET STATISTICS DISTINCT.
>

Seems to be an 8.5 feature. Otherwise, it'd be perfect since I happen to
know the actual number (it's the row count of another table).

I've run the analyze again and it's still estimating around 85K distinct
pids. That's with the default setting of 100 for default_statistics_target,
but I'm not sure I want to mess with that right now (this table is just
going to be dropped in a few days after I manipulate it a bit more).

The only question I really have is this: if I fix this number and leave
enable_hashagg on, is it just going to have the same effect (for this one
query) of turning enable_hashagg off? Because if I'm just going to have to
bite the bullet and run the query with GroupAggregate
(cost=287511359.15..297867743.71), I guess I'll just have to do that. Or
try to figure out another way to get what it is I'm trying to get (I should
have just imported the right numbers from the beginning, but the import took
3 days so I don't feel like doing that again).

In any case, thanks a lot for the help, both of you.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Nick 2009-12-29 22:28:19 Re: Installation of Postgis/postgresql
Previous Message Tom Lane 2009-12-29 22:04:01 Re: memory usage of group by select