Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, James Coleman <jtc331(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date: 2019-11-11 00:33:41
Message-ID: CA+hUKG+2K6aZwjMNfq6i10_1jQmmcPokdYgBvFhJ1CjKYn2Ovw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Nov 11, 2019 at 12:44 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On Sun, Nov 10, 2019 at 02:46:31PM -0800, Andres Freund wrote:
> >On 2019-11-10 22:50:17 +0100, Tomas Vondra wrote:
> >> On Sun, Nov 10, 2019 at 10:23:52PM +0100, Tomas Vondra wrote:
> >> > On Mon, Nov 11, 2019 at 10:08:58AM +1300, Thomas Munro wrote:
> >> > Can't we simply compute two hash values, using different seeds - one for
> >> > bucket and the other for batch? Of course, that'll be more expensive.
> >>
> >> Meh, I realized that's pretty much just a different way to get 64-bit
> >> hashes (which is what you mentioned).
> >
> >I'm not sure it's really the same, given practical realities in
> >postgres. Right now the "extended" hash function supporting 64 bit hash
> >functions is optional. So we couldn't unconditionally rely on it being
> >present, even in master, unless we're prepared to declare it as
> >required from now on.
> >
> >So computing two different hash values at the same time, by using a
> >different IV and a different combine function, doesn't seem like an
> >unreasonable approach.
>
> True. I was commenting on the theoretical fact that computing two 32-bit
> hashes is close to computing a 64-bit hash, but you're right there are
> implementation details that may make it more usable in our case.

Here is a quick sketch of something like that, for discussion only. I
figured that simply mixing the hash value we have with some arbitrary
bits afterwards would be just as good as having started with a
different IV, which leads to a very simple change without refactoring.
From quick experiments with unique keys (generate_series) I seem to
get approximately even sized partitions, and correct answers, but I
make no claim to strong hash-math-fu and haven't tested on very large
inputs. Thoughts?

Attachment Content-Type Size
rehash-partition.patch application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2019-11-11 10:46:05 Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Previous Message Tomas Vondra 2019-11-10 23:44:38 Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash