Re: Solving hash table overrun problems

From: Aaron Birkland <birkie(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Solving hash table overrun problems
Date: 2005-03-07 00:46:27
Message-ID: 19ab0ccd05030616463919926a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > This also brings up a line of thought I had a while ago on a related
> > topic. Something like a "HashDistinct" might be useful, if it had no
> > startup cost.

> We already have that: the planner will use a HashAgg node in this
> fashion in some contexts (I think just as one of the ways to do IN,
> at the moment).

Hmm.. I see HashAggregate being used that way in the IN queries, but I
have not observed it used in a way that incurrs no startup cost. It
looked to me that in doing hash aggregation in ExecAgg (nodeAgg.c),
agg_fill_hash_table() would have to be called, which iterate through
every output of the child plan building the hash table before it
returns, thus incurring at least the startup cost of executing the
entire subplan of the child node at the aggregation stage. I'm not
too familiar with the code, so there is probably something I'm missing
somewhere :(

> It's not yet bright enough to consider doing it for
> SELECT DISTINCT. The DISTINCT planning code is old and crufty and
> pretty tightly interwired with ORDER BY ... it needs work.

Yes, SELECT DISTINCT was my motivating example, though in my specific
application latency (i.e. not having to wait for the entire query
below the DISTINCT operation to finish) was also an important factor,
hence my thoughts on a zero startup cost hash aggregation and
wondering if it would really be any kind of win in the end.

-Aaron

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2005-03-07 01:19:31 Re: Continue transactions after errors in psql
Previous Message mchron 2005-03-07 00:40:49 ERROR: unrecognized node type in PostgresMain( )