Re: pgsql: Add parallel-aware hash joins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2018-01-03 18:03:39
Message-ID: 18804.1515002619@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Wed, Jan 3, 2018 at 2:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hm. That could do it, except it doesn't really account for the observed
>> result that slower single-processor machines seem more prone to the
>> bug. Surely they should be less likely to get multiple workers activated.

> It plans for 8 batches, and then usually but not always goes to 16 at
> execution time depending on timing. It doesn't happen for me with
> 128kB (the setting used in the regression test), but I think the
> affected BF machines are all 32 bit systems that have MAXIMUM_ALIGNOF
> 8 and therefore use a bit more space, whereas my machines have
> MAXIMUM_ALIGNOF 4 in a 32 bit build, so that would explain the
> different location of this unstable border.

Ah, that might do it. You're right that gaur/pademelon are in that class.

> We could probably fix the
> failures by simply increasing work_mem out of that zone, but I'm
> hoping to fix the problem in a more principled way. More soon.

Meh. I think we're going to end up having to pick a modified test
case that's further away from the chunk size threshold. I do not
think it is possible to predict this runtime behavior exactly at
plan time, nor am I convinced that expending planner cycles on a
somewhat closer estimate is a useful use of planning time.

>> I'm also wondering why the non-parallel path seems to prefer to allocate
>> in units of HASH_CHUNK_SIZE + HASH_CHUNK_HEADER_SIZE while the parallel
>> path targets allocations of exactly HASH_CHUNK_SIZE,

> That is intentional: dsa.c sucks at allocating 32KB + a tiny bit
> because it works in 4KB pages for large allocations, so I wanted to
> make HASH_CHUNK_SIZE the total size that arrives into dsa_allocate().
> The non-parallel path has similar problems on some libc
> implementations, as we discussed over here:
> https://www.postgresql.org/message-id/flat/29770.1511495642%40sss.pgh.pa.us

Oh, right, I'd forgotten about that discussion. I think it would be
good to adjust hashjoin so that both paths are targeting 32KB total;
but you are right that there's not a lot of point in fooling with the
non-parallel path until we add some way of accounting for aset.c's
overhead too. We can leave that for later.

>> BTW, I'm seeing a few things that look bug-like about
>> ExecParallelHashTuplePrealloc, ...

> Right. Fixed in the attached.

Pushed, plus an additional Assert to clarify that we're expecting
ExecParallelHashTuplePrealloc's caller to have already maxalign'ed
the request size.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-01-03 19:16:07 Re: pgsql: Fix deadlock hazard in CREATE INDEX CONCURRENTLY
Previous Message Tom Lane 2018-01-03 17:53:54 pgsql: Fix some minor errors in new PHJ code.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-03 18:34:02 Re: to_timestamp TZH and TZM format specifiers
Previous Message Andrew Dunstan 2018-01-03 18:03:01 to_timestamp TZH and TZM format specifiers