Re: Missed condition-variable wakeups on FreeBSD

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Missed condition-variable wakeups on FreeBSD
Date: 2022-02-26 21:06:25
Message-ID: 20220226210625.GK9008@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 26, 2022 at 02:07:05PM -0500, Tom Lane wrote:
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.

I've seen some similar interactions with strace under linux causing a process
to be woken up or otherwise incur a different behavior (not necessarily
postgres).

> Thoughts? Ideas on debugging this?

Before attaching a debugger, figure out what syscall each process is in.
In linux, that's ps O wchan PID.

>> Besides trying to make the issue more likely as suggested above, it might be
>> worth checking if signalling the stuck processes with SIGUSR1 gets them
>> unstuck.

And SIGCONT.

Maybe already did this, but you can dump a corefile of the running processes to
allow future inspection.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-02-26 21:07:20 Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))
Previous Message Andres Freund 2022-02-26 20:44:31 Re: Missed condition-variable wakeups on FreeBSD