Re: Out of memory error during large hashagg

From: Casey Duncan <casey(at)pandora(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Out of memory error during large hashagg
Date: 2006-09-19 16:27:13
Message-ID: 8E8D36A0-61E4-46D1-BE1B-789E41D6CD4A@pandora.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On Sep 19, 2006, at 1:51 AM, Simon Riggs wrote:

> On Mon, 2006-09-18 at 14:08 -0700, Casey Duncan wrote:
>> I've reported variants of this in the past, but this case is entirely
>> repeatable.
>>
>> Executing this query:
>>
>> select st_id, min(seed_id) as "initial_seed_id", count(*) as
>> "seed_count"
>> from seed group by st_id;
>>
>> The query plan and table stats are:
>>
>> QUERY PLAN
>> ---------------------------------------------------------------------
>> --
>> HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16)
>> -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562
>> width=16)
>>
>>
>> relname | relpages | reltuples
>> ---------+----------+-------------
>> seed | 428880 | 5.26984e+07
>>
>
> The hashagg doesn't yet have scroll to disk capability, so a bad
> estimation of ndistinct will cause this to fail (at any release). This
> is a known issue for certain distributions of data only. The
> workaround
> is the one you know about already: enable_hashagg = off

Ok, is that something for the TODO list? I took a glance and didn't
see it.

>
> I'm interested in collecting info on the distribution of data.
> Can you post:
>
> select tablename, attname, n_distinct from pg_stats
> where attname = 'st_id';

tablename | attname | n_distinct
--------------+---------+------------
st | st_id | -1
seed | st_id | 164656
feed | st_id | 14250
book | st_id | 14856
legacy_st | st_id | -1
(5 rows)

I ran analyze after this, but the results were roughly the same.

> select count(distinct st_id) from seed;

count
----------
40418083
(1 row)

Looks a tad bit different than the above ;^)

> and also the table definition, including the PK

Table "public.seed"
Column | Type | Modifiers
--------------+-----------------------------+---------------
seed_id | bigint | not null
mc_id | character varying(20) |
st_id | bigint |
date_created | timestamp without time zone | default now()
Indexes:
"seed_pkey" PRIMARY KEY, btree (seed_id)
"seed_st_mc_id_idx" UNIQUE, btree (mc_id, st_id)
"seed_mc_id_idx" btree (mc_id)
"seed_st_id" btree (st_id)
Foreign-key constraints:
"seed_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON
DELETE RESTRICT

Thanks

-Casey

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Massimo Fidanza 2006-09-21 08:40:51 Re: BUG #2632: createuser language poblem
Previous Message Tom Lane 2006-09-19 14:36:12 Re: Optimizer Bug of Debug Version