Quick Links

Re: Solving hash table overrun problems

From:	Bruno Wolff III <bruno(at)wolff(dot)to>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Solving hash table overrun problems
Date:	2005-03-04 16:22:01
Message-ID:	20050304162201.GA22428@wolff.to
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 04, 2005 at 10:42:08 -0500,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> * Estimate the number of batches N using the planner's estimate.
> >> We will always choose N a power of 2. A tuple's batch number is
> >> ((H div K) mod N).
>
> > If K is way low this could be very slow.
>
> How so? You're not concerned about the time to do the division itself
> are you?

No, rather having lots of entries in the same hash buckets. I was thinking
about recent discussions were there was a large number of rows with almost
all of the keys having just a few values, but there are a lot of unique
keys, but analyze doesn't see enough of the unique ones to make a good
estimate for K.

> >> * Now begin scanning the outer join input. Tuples of batch number
> >> zero (according to the current calculation) can be matched to the
> >> current hashtable contents. Tuples of higher batch numbers are dropped
> >> into holding files for the outer input, one per batch.
>
> > For new keys at this step do you know their final disposition so that making
> > new hash entries won't be necessary?
>
> Well, we probably have a pretty fair idea of N at this point, so it'll
> usually be right --- but we reserve the right to increase N again later
> in case we have to do so because one of the later inner batches is much
> bigger than the zero batch we are currently processing.

I just noticed that it wasn't mentioned that an overflow could occur at this
step. I didn't think it would be hard to do one if needed, but was wondering
if knowing a key couldn't match (because it was in the current batch 0 and
didn't match and existing key in that batch) was enough to emit or discard
the row.

In response to

Re: Solving hash table overrun problems at 2005-03-04 15:42:08 from Tom Lane

Responses

Re: Solving hash table overrun problems at 2005-03-04 16:12:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	pgsql	2005-03-04 17:55:04	Re: bitmap AM design
Previous Message	Bruce Momjian	2005-03-04 16:20:30	I am in Copenhagen