Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date: 2019-11-20 00:05:56
Message-ID: CA+hUKGL21-qqT2GCGykSUzwY1wP7XvupZhX2Ve5kpcoX-rk_ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 14, 2019 at 9:20 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Thu, Nov 14, 2019 at 7:48 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I don't really see what the problem is with using just a differently
> > mixed hash. There's not really a requirement for there to be a
> > tremendous amount of independence between the bits used for the bucket,
> > and the bits for the batch. We cannot just reuse exactly the same bits
> > for the bucket that we used for the hash, as otherwise we'd just raise
> > the number of bucket conflicts (as the whole hashtable will only contain
> > tuples with the reduced number of bits). But I don't see why that
> > implies a need for full bit indepence. As long as the mixing guarantees
> > that we use the bits for bucketing differently than the ones for
> > batches, I think we're good. All that need is that each batch is is
> > roughly equally sized, and contains tuples with hashes that roughly
> > equally distributed - you could bitreverse the hash for batch selection,
> > and get that, no?
>
> Right, that's what I said earlier:
>
> > batchno = reverse_bits(hashvalue) & (nbatch - 1);
> > bucketno = hashvalue & (nbuckets - 1);
>
> My theory is that batchno = hash_combine(<constant>, hashvalue) &
> (nbatch - 1) has about the same distribution properties. That is, it
> works nicely up until you have more than about 2^32 keys, and then it
> begins to break down due to lack of information, but the break down
> manifests as more conflicts in buckets, instead of the failure to
> repartition once you start reading off the end of the hash value (the
> cause of the present bug report). The bit reverse approach might be a
> bit less mysterious, but when I looked at ways to implement that they
> looked potentially slower than the xor, shifts and adds used by
> hash_combine. I didn't attempt to test that, though.

If you have plenty of disk space and time, you can repro the problem like so:

create unlogged table sevenb as select generate_series(1, 7000000000)::bigint i;
analyze sevenb;
set work_mem = '150MB';
explain analyze select count(*) from sevenb t1 join sevenb t2 using (i);

If you have max_parallel_workers_per_gather >= 6 it gives up the
strange idea of using a sort-merge join, otherwise you also need
enable_mergejoin = off to prevent that, which is curious.

You can see as soon as it starts executing the join that it's only
writing to 1023 partitions (that is, 1023 * nprocesses temporary files
with names like "i...of4096.p...", and then when one of those gets too
big, "i...of8192.p...", and so on until something goes wrong). My
(slow) machine hasn't finished repartitioning yet... I'll check again
tomorrow, if it hasn't melted... Patched, it writes to 4095 as it
should (there is no "i0of4096..."), their sizes are uniform, and the
query completes after a single repartitioning operation. (Well it's
4096 with work_mem = 150MB and 2 workers, YMMV).

I'm planning to push something like this late this week (though I
think the constant might be better as 0 since it doesn't matter much
given the definition, and I need a back-patchable version that is open
coded or I need to back-patch hash_combine), unless someone has a
better idea.

Separately, I'm working on an idea for how to support extended hashing
for the master branch only.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2019-11-20 01:36:49 Re: REINDEX CONCURRENTLY unexpectedly fails
Previous Message Tom Lane 2019-11-19 23:31:11 Re: No = operator for opfamily 426