Re: Wrong results from Parallel Hash Full Join

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results from Parallel Hash Full Join
Date: 2023-04-12 22:49:21
Message-ID: CA+hUKGKuLORE0OQ20sFrfdp=5tKVvamB-FnnCS03UZE+o8UFjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 13, 2023 at 9:48 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Attached patch includes the fix for ExecParallelHashTableInsert() as
> well as a test. I toyed with adapting one of the existing parallel full
> hash join tests to cover this case, however, I think Richard's repro is
> much more clear. Maybe it is worth throwing in a few updates to the
> tables in the existing queries to provide coverage for the other
> HeapTupleHeaderClearMatch() calls in the code, though.

Oof. Analysis and code LGTM.

I thought about the way non-parallel HJ also clears the match bits
when re-using the hash table for rescans. PHJ doesn't keep hash
tables across rescans. (There's no fundamental reason why it
couldn't, but there was some complication and it seemed absurd to have
NestLoop over Gather over PHJ, forking a new set of workers for every
tuple, so I didn't implement that in the original PHJ.) But... there
is something a little odd about the code in
ExecHashTableResetMatchFlags(), or the fact that we appear to be
calling it: it's using the unshared union member unconditionally,
which wouldn't actually work for PHJ (there should be a variant of
that function with Parallel in its name if we ever want that to work).
That's not a bug AFAICT, as in fact we don't actually call it--it
should be unreachable because the hash table should be gone when we
rescan--but it's confusing. I'm wondering if we should put in
something explicit about that, maybe a comment and an assertion in
ExecReScanHashJoin().

+-- Ensure that hash join tuple match bits have been cleared before putting them
+-- into the hashtable.

Could you mention that the match flags steals a bit from the HOT flag,
ie *why* we're testing a join after an update? And if we're going to
exercise/test that case, should we do the non-parallel version too?

For the commit message, I think it's a good idea to use something like
"Fix ..." for the headline of bug fix commits to make that clearer,
and to add something like "oversight in commit XYZ" in the body, just
to help people connect the dots. (Yeah, I know I failed to reference
the delinquent commit in the recent assertion-removal commit, my bad.)
I think "Discussion:" footers are supposed to use
https://postgr.es/m/XXX shortened URLs.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2023-04-12 22:52:40 Re: Add LZ4 compression in pg_dump
Previous Message sirisha chamarthi 2023-04-12 22:43:40 Add ps display while waiting for wal in read_local_xlog_page_guts