Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
Date: 2015-12-17 18:20:44
Message-ID: CA+TgmoZbzfF8eCwY536Umr=uyvR_zHO-xL0xzDQPDshH6yWBkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2015 at 3:04 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> attached is v1 of one of the hashjoin improvements mentioned in September in
> the lengthy thread [1].
>
> The main objection against simply removing the MaxAllocSize check (and
> switching to MemoryContextAllocHuge) is that if the number of rows is
> overestimated, we may consume significantly more memory than necessary.
>
> We run into this issue because we allocate the buckets at the very
> beginning, based on the estimate. I've noticed we don't really need to do
> that - we don't really use the buckets until after the Hash node completes,
> and we don't even use it when incrementing the number of batches (we use the
> dense allocation for that).
>
> So this patch removes this - it postpones allocating the buckets to the end
> of MultiExecHash(), and at that point we know pretty well what is the
> optimal number of buckets.
>
> This makes tracking nbuckets_optimal/log2_nbuckets_optimal unnecessary, as
> we can simply use nbuckets/log2_nbuckets for that purpose. I've also removed
> nbuckets_original, but maybe that's not a good idea and we want to keep that
> information (OTOH we never use that number of buckets).
>
> This patch does not change the estimation in ExecChooseHashTableSize() at
> all, because we still need to do that to get nbucket/nbatch. Maybe this is
> another opportunity for improvement in case of overestimates, because in
> that case it may happen that we do batching even when we could do without
> it. So we might start with nbuckets=1024 and nbatches=1, and only switch to
> the estimated number of batches if really needed.
>
> This patch also does not mess with the allocation, i.e. it still uses the
> MaxAllocSize limit (which amounts to ~256MB due to the doubling, IIRC), but
> it should make it easier to do that change.

If this doesn't regress performance in the case where the number of
buckets is estimated accurately to begin with, then I think this is a
great idea. Can you supply some performance tests results for that
case, and maybe some of the other cases also?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-12-17 18:38:44 Re: Support for N synchronous standby servers - take 2
Previous Message Andres Freund 2015-12-17 18:20:23 Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"