pgsql: Fix race in Parallel Hash Join batch cleanup.

From: Thomas Munro <tmunro(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Fix race in Parallel Hash Join batch cleanup.
Date: 2021-03-17 05:15:36
Message-ID: E1lMOWu-00020E-UJ@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix race in Parallel Hash Join batch cleanup.

With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.

Fix, by introducing an extra barrier phase. The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race. This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().

Back-patch to 11, where the code arrived.

Reported-by: Michael Paquier <michael(at)paquier(dot)xyz>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz

Branch
------
REL_11_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/0129c56fbe5c26bfec91bfc2c8a3b8818f441d6e

Modified Files
--------------
src/backend/executor/nodeHash.c | 47 +++++++++++++++++++++++++------------
src/backend/executor/nodeHashjoin.c | 40 ++++++++++++++++++-------------
src/include/executor/hashjoin.h | 3 ++-
3 files changed, 58 insertions(+), 32 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2021-03-17 05:47:57 pgsql: Update the names of Parallel Hash Join phases.
Previous Message Thomas Munro 2021-03-17 05:15:24 pgsql: Fix race in Parallel Hash Join batch cleanup.