Re: Parallel Full Hash Join

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Full Hash Join
Date: 2023-04-08 17:30:24
Message-ID: CAAKRu_aNqaQC9vwts5g40PLOOA=o=KKHbcDuwzK4w82GVSh6XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 8, 2023 at 12:33 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > I committed the main patch.
>
> BTW, it was easy to miss in all the buildfarm noise from
> last-possible-minute patches, but chimaera just showed something
> that looks like a bug in this code [1]:
>
> 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG: statement: savepoint settings;
> 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG: statement: set local max_parallel_workers_per_gather = 2;
> 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG: statement: explain (costs off)
> select count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
> 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG: statement: select count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
> TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: "nodeHash.c", Line: 2118, PID: 19147
> postgres: parallel worker for PID 18027 (ExceptionalCondition+0x84)[0x10ae2bfa4]
> postgres: parallel worker for PID 18027 (ExecParallelPrepHashTableForUnmatched+0x224)[0x10aa67544]
> postgres: parallel worker for PID 18027 (+0x3db868)[0x10aa6b868]
> postgres: parallel worker for PID 18027 (+0x3c4204)[0x10aa54204]
> postgres: parallel worker for PID 18027 (+0x3c81b8)[0x10aa581b8]
> postgres: parallel worker for PID 18027 (+0x3b3d28)[0x10aa43d28]
> postgres: parallel worker for PID 18027 (standard_ExecutorRun+0x208)[0x10aa39768]
> postgres: parallel worker for PID 18027 (ParallelQueryMain+0x2bc)[0x10aa4092c]
> postgres: parallel worker for PID 18027 (ParallelWorkerMain+0x660)[0x10a874870]
> postgres: parallel worker for PID 18027 (StartBackgroundWorker+0x2a8)[0x10ab8abf8]
> postgres: parallel worker for PID 18027 (+0x50290c)[0x10ab9290c]
> postgres: parallel worker for PID 18027 (+0x5035e4)[0x10ab935e4]
> postgres: parallel worker for PID 18027 (PostmasterMain+0x1304)[0x10ab96334]
> postgres: parallel worker for PID 18027 (main+0x86c)[0x10a79daec]

Having not done much debugging on buildfarm animals before, I don't
suppose there is any way to get access to the core itself? I'd like to
see how many participants the batch barrier had at the time of the
assertion failure. I assume it was 2, but I just wanted to make sure I
understand the race.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-08 17:47:02 Re: longfin missing gssapi_ext.h
Previous Message Jonathan S. Katz 2023-04-08 17:12:16 Re: Minimal logical decoding on standbys