Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date: 2023-06-18 01:03:48
Message-ID: eae2793c-280f-15b8-885f-d05a7cc314ae@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/7/23 01:09, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 1:06 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> On 2/7/23 00:48, Thomas Munro wrote:
>>> On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>> No, I left the workload as it was for the first lockup, so `make check`
>>>> runs everything as is up until the "join" test suite.
>>>
>>> Wait, shouldn't that be join_hash?
>>
>> No, because join_hash does not exist on 11 (it was added in 12). Also,
>> it actually locked up like this - that's the lockup I reported on 28/1.
>
> Oh, good. I had been trying to repro with 12 here and forgot that you
> were looking at 11...

FYI it happened again, on a regular run of regression tests (I gave up
on trying to reproduce this - after some initial hits I didn't hit it in
a couple thousand tries so I just added the machine back to buildfarm).

Anyway, same symptoms - lockup in join_hash on PG11, leader waiting on
WaitLatch and both workers waiting on BarrierArriveAndWait. I forgot
running gdb on the second worker will get it unstuck, so I haven't been
able to collect more info.

What else do you think would be useful to collect next time?

It's hard to draw conclusions due to the low probability of the issue,
but it's pretty weird this only ever happened on 11 so far.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
bt.txt text/plain 11.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-06-18 01:50:16 Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific
Previous Message James Cloos 2023-06-17 23:10:23 deb’s pg_upgradecluster(1) vs streaming replication