Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date: 2019-11-13 18:47:59
Message-ID: 20191113184759.gacxzbxk523vupfc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2019-11-11 11:46:05 +0100, Tomas Vondra wrote:
> On Mon, Nov 11, 2019 at 01:33:41PM +1300, Thomas Munro wrote:
> Hmmm, iteresting. Using a hard-coded constant seems a bit weird,
> although I don't have any precise argument why it's wrong yet.
>
> Essentially, the patch does this:
>
> batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1);
> bucketno = hashvalue & (nbuckets - 1);
>
> but adding a constant does not 'propagate' any bits from the original
> value to the hash. So it seems pretty much equivalent to

I was over IM arguing that we ought to also use a different mixing
functions when computing the hash.

> batchno = hashvalue & (nbatch - 1);
> bucketno = hashvalue & (nbuckets - 1);

> so we still operate with the same number of bits. It might solve this
> particular case, but I doubt it's a good idea in general ...
>
> I think we will have to actually compute two hashes, to get more than 32
> bits. But yes, that'll be more invasive, and unlikely backpatchable.

I think it might also just generally not be acceptable, from a
performance POV. Computing the hashes is already a major bottleneck for
HJ. Penalizing everybody for these extreme cases doesn't strike me as
great. It might be ok to compute a 64bit hash, but I don't see how
computing two hashes in parallel wouldn't have significant overhead.

I don't really see what the problem is with using just a differently
mixed hash. There's not really a requirement for there to be a
tremendous amount of independence between the bits used for the bucket,
and the bits for the batch. We cannot just reuse exactly the same bits
for the bucket that we used for the hash, as otherwise we'd just raise
the number of bucket conflicts (as the whole hashtable will only contain
tuples with the reduced number of bits). But I don't see why that
implies a need for full bit indepence. As long as the mixing guarantees
that we use the bits for bucketing differently than the ones for
batches, I think we're good. All that need is that each batch is is
roughly equally sized, and contains tuples with hashes that roughly
equally distributed - you could bitreverse the hash for batch selection,
and get that, no?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Mukesh Chhatani 2019-11-13 18:52:13 Re: BUG #16109: Postgres planning time is high across version - 10.6 vs 10.10
Previous Message Tomas Vondra 2019-11-13 18:47:15 Re: [BUG] Uninitializaed configOut.leafType used.