Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date: 2023-01-29 17:53:36
Message-ID: 20230129175336.dvplrhkoba3pjxpk@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote:
> Will do, but I'll wait for another lockup to see how frequent it
> actually is. I'm now at ~90 runs total, and it didn't happen again yet.
> So hitting it after 15 runs might have been a bit of a luck.

Was there a difference in how much load there was on the machine between
"reproduced in 15 runs" and "not reproed in 90"? If indeed lack of barriers
is related to the issue, an increase in context switches could substantially
change the behaviour (in both directions). More intra-process context
switches can amount to "probabilistic barriers" because that'll be a
barrier. At the same time it can make it more likely that the relatively
narrow window in WaitEventSetWait() is hit, or lead to larger delays
processing signals.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-29 18:05:07 Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl
Previous Message Andres Freund 2023-01-29 17:41:10 Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)