Quick Links

Re: Parallel Full Hash Join

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Full Hash Join
Date:	2022-01-11 21:30:37
Message-ID:	CAAKRu_YX8PdxHbqV4wEk6v-ivMTwN+tcYG8AtTe3HT3gz=ewmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Nov 26, 2021 at 3:11 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Sun, Nov 21, 2021 at 4:48 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > On Wed, Nov 17, 2021 at 01:45:06PM -0500, Melanie Plageman wrote:
> > > Yes, this looks like that issue.
> > >
> > > I've attached a v8 set with the fix I suggested in [1] included.
> > > (I added it to 0001).
> >
> > This is still crashing :(
> > https://cirrus-ci.com/task/6738329224871936
> > https://cirrus-ci.com/task/4895130286030848
>
> I added a core file backtrace to cfbot's CI recipe a few days ago, so
> now we have:
>
> https://cirrus-ci.com/task/5676480098205696
>
> #3 0x00000000009cf57e in ExceptionalCondition (conditionName=0x29cae8
> "BarrierParticipants(&accessor->shared->batch_barrier) == 1",
> errorType=<optimized out>, fileName=0x2ae561 "nodeHash.c",
> lineNumber=lineNumber(at)entry=2224) at assert.c:69
> No locals.
> #4 0x000000000071575e in ExecParallelScanHashTableForUnmatched
> (hjstate=hjstate(at)entry=0x80a60a3c8,
> econtext=econtext(at)entry=0x80a60ae98) at nodeHash.c:2224

I believe this assert can be safely removed.

It is possible for a worker to attach to the batch barrier after the
"last" worker was elected to scan and emit unmatched inner tuples. This
is safe because the batch barrier is already in phase PHJ_BATCH_SCAN
and this newly attached worker will simply detach from the batch
barrier and look for a new batch to work on.

The order of events would be as follows:

W1: advances batch to PHJ_BATCH_SCAN
W2: attaches to batch barrier in ExecParallelHashJoinNewBatch()
W1: calls ExecParallelScanHashTableForUnmatched() (2 workers attached to
barrier at this point)
W2: detaches from the batch barrier

The attached v10 patch removes this assert and updates the comment in
ExecParallelScanHashTableForUnmatched().

I'm not sure if I should add more detail about this scenario in
ExecParallelHashJoinNewBatch() under PHJ_BATCH_SCAN or if the detail in
ExecParallelScanHashTableForUnmatched() is sufficient.

- Melanie

Attachment	Content-Type	Size
v10-0002-Improve-the-naming-of-Parallel-Hash-Join-phases.patch	text/x-patch	25.3 KB
v10-0001-Fix-race-condition-in-parallel-hash-join-batch-c.patch	text/x-patch	11.0 KB
v10-0003-Parallel-Hash-Full-Right-Outer-Join.patch	text/x-patch	24.9 KB

In response to

Re: Parallel Full Hash Join at 2021-11-26 20:11:21 from Thomas Munro

Responses

Re: Parallel Full Hash Join at 2022-04-08 11:29:32 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2022-01-11 21:35:41	Re: Windows crash / abort handling
Previous Message	Alvaro Herrera	2022-01-11 21:30:11	Re: row filtering for logical replication