Re: Assertion failure with barriers in parallel hash join

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Assertion failure with barriers in parallel hash join
Date: 2022-06-02 09:30:31
Message-ID: 165416223138.23516.14510796482570365828.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

Hi all,

We recently encountered the same bug in the field. Oleksii Kozlov managed to come up with reproduction steps, which reliably trigger it. Interestingly, the bug does not only manifest as failing assertion, but also as segmentation fault; in builds with disabled and with enabled (!) assertions. So it can crash production environments. We applied the proposed patch v3 from Melanie to the REL_14_3 branch and can confirm that with the patch neither the assertion nor the segmentation fault still occur.

I have also glanced at the code and the implementation looks fine. However, I'm not an expert for the fairly involved hash join state machine.
There seems to be no need for additional documentation.

For completeness here is the stack trace of the segmentation fault.
Like the stack trace from the assertion failure initially shared by Michael and also encountered by us, the stack trace of the segmentation fault also contains ExecParallelHashJoinNewBatch().

#9 | Source "/opt/src/backend/executor/execMain.c", line 361, in standard_ExecutorRun
| Source "/opt/src/backend/executor/execMain.c", line 1551, in ExecutePlan
Source "/opt/src/include/executor/executor.h", line 257, in ExecProcNode [0x657e4d]
#8 | Source "/opt/src/backend/executor/nodeAgg.c", line 2179, in ExecAgg
Source "/opt/src/backend/executor/nodeAgg.c", line 2364, in agg_retrieve_direct [0x66ba60]
#7 | Source "/opt/src/backend/executor/nodeAgg.c", line 581, in fetch_input_tuple
Source "/opt/src/include/executor/executor.h", line 257, in ExecProcNode [0x66d585]
#6 | Source "/opt/src/backend/executor/nodeHashjoin.c", line 607, in ExecParallelHashJoin
| Source "/opt/src/backend/executor/nodeHashjoin.c", line 560, in ExecHashJoinImpl
Source "/opt/src/backend/executor/nodeHashjoin.c", line 1132, in ExecParallelHashJoinNewBatch [0x67a89b]
#5 | Source "/opt/src/backend/storage/ipc/barrier.c", line 242, in BarrierAttach
Source "/opt/src/include/storage/s_lock.h", line 228, in tas [0x7c2a1b]
#4 Object "/lib/x86_64-linux-gnu/libpthread.so.0", at 0x7f4db364841f, in __funlockfile

--
David Geier
(SericeNow)

The new status of this patch is: Ready for Committer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2022-06-02 10:01:37 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Etsuro Fujita 2022-06-02 09:14:47 Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT