Re: Backends stunk in wait event IPC/MessageQueueInternal

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Japin Li <japinli(at)hotmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Backends stunk in wait event IPC/MessageQueueInternal
Date: 2022-05-17 03:31:24
Message-ID: CA+hUKGKGqjM1H8T7fNqmKUgmifDDyPEHRT7FdpxFLVOMyOKa0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 16, 2022 at 3:45 PM Japin Li <japinli(at)hotmail(dot)com> wrote:
> Maybe use the __illumos__ macro more accurity.
>
> +#elif defined(WAIT_USE_EPOLL) && defined(HAVE_SYS_SIGNALFD_H) && \
> + !defined(__sun__)

Thanks, updated, and with a new commit message.

I don't know much about these OSes (though I used lots of Sun machines
during the Jurassic period). I know that there are three
distributions of illumos: OmniOS, SmartOS and OpenIndiana, and they
share the same kernel and base system. The off-list reports I
received about hangs and kernel panics were from OpenIndiana animals
hake and haddock, which are not currently reporting (I'll ask why),
and then their owner defined -DWAIT_USE_POLL to clear that up while we
waited for progress on his kernel panic bug report. I see that OmniOS
animal pollock is currently reporting and also uses -DWAIT_USE_POLL,
but I couldn't find any discussion about that.

Of course, you might be hitting some completely different problem,
given the lack of information. I'd be interested in the output of "p
*MyLatch" (= to see if the latch has already been set), and whether
"kill -URG PID" dislodges the stuck process. But given the open
kernel bug report that I've now been reminded of, I'm thinking about
pushing this anyway. Then we could ask the animal owners to remove
-DWAIT_USE_POLL so that they'd effectively be running with
-DWAIT_USE_EPOLL and -DWAIT_USE_SELF_PIPE, which would be more like
PostgreSQL 13, but people who want to reproduce the problem on the
illumos side could build with -DWAIT_USE_SIGNALFD.

Attachment Content-Type Size
v3-0001-Don-t-trust-signalfd-on-illumos.patch text/x-patch 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-05-17 03:40:52 Re: Minor improvements to test log navigability
Previous Message Amit Kapila 2022-05-17 03:26:21 Re: bogus: logical replication rows/cols combinations