Re: Solving hash table overrun problems

From: Tom Lane Bruno Wolff III pgsql-hackers(at)postgreSQL(dot)org Re: Solving hash table overrun problems 2005-03-04 15:42:08 27706.1109950928@sss.pgh.pa.us (view raw, whole thread or download thread mbox) 2005-03-03 22:05:40 from Tom Lane  2005-03-04 06:05:37 from Aaron Birkland   2005-03-04 14:52:46 from Tom Lane    2005-03-07 00:46:27 from Aaron Birkland  2005-03-04 15:46:07 from Bruno Wolff III   2005-03-04 15:42:08 from Tom Lane    2005-03-04 16:22:01 from Bruno Wolff III     2005-03-04 16:12:43 from Tom Lane pgsql-hackers
```Bruno Wolff III <bruno(at)wolff(dot)to> writes:
>   Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> * Estimate the number of batches N using the planner's estimate.
>> We will always choose N a power of 2.  A tuple's batch number is
>> ((H div K) mod N).

> If K is way low this could be very slow.

How so?  You're not concerned about the time to do the division itself
are you?

>> * Now begin scanning the outer join input.  Tuples of batch number
>> zero (according to the current calculation) can be matched to the
>> current hashtable contents.  Tuples of higher batch numbers are dropped
>> into holding files for the outer input, one per batch.

> For new keys at this step do you know their final disposition so that making
> new hash entries won't be necessary?

Well, we probably have a pretty fair idea of N at this point, so it'll
usually be right --- but we reserve the right to increase N again later
in case we have to do so because one of the later inner batches is much
bigger than the zero batch we are currently processing.

regards, tom lane

```

pgsql-hackers by date

 Next: From: Bruno Wolff III Date: 2005-03-04 15:46:07 Subject: Re: Solving hash table overrun problems Previous: From: Tom Lane Date: 2005-03-04 15:28:24 Subject: Re: bitmap AM design