Re: pgsql: Add kqueue(2) support to the WaitEventSet API.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <tmunro(at)postgresql(dot)org>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
Date: 2020-03-16 21:21:29
Message-ID: CA+hUKGKsTnvROYqx4hUjGpwvCTYi+=+da7sTPCw2Mf2Yxfdi4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Mar 17, 2020 at 9:30 AM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> On 2020-Mar-17, Thomas Munro wrote:
> > Reproduced here. The problem seems to be that macOS's getppid()
> > returns the debugger's PID, while the debugger is attached. This
> > doesn't happen on FreeBSD (even though the debugger does internally
> > become the parent, getppid() is careful to return the "real" parent
> > PID so that user space doesn't notice this trickery; apparently Apple
> > made a different choice).
>
> Wow ... Yeah, that was a known problem with FreeBSD, see
> https://postgr.es/m/1292851036-sup-5399@alvh.no-ip.org
> Evidently FreeBSD must have fixed it, but macOS has not caught up with
> that ...

Oh, interesting. Sorry to bring a variant of this problem back.

> > The getppid() check is there to close a vanishingly rare race
> > condition: when creating a WaitEventSet, we ask the kernel to tell us
> > when the postmaster exits, but there is a possibility that the
> > postmaster has already exited; normally that causes an error with
> > errno == ESRCH (no such PID, it's already gone), but another unrelated
> > process might have started that has the same PID, so we check if our
> > ppid has changed after a successful return code. That's not going to
> > work under a debugger on this OS.
>
> Irk.

I'm now far away from my home Mac so I can't test until later but I
think we can fix this by double checking with the pipe:

- else if (event->events == WL_POSTMASTER_DEATH && PostmasterPid
!= getppid())
+ else if (event->events == WL_POSTMASTER_DEATH &&
+ PostmasterPid != getppid() &&
+ !PostmasterIsAliveInternal())
+ {
+ /*
+ * The extra PostmasterIsAliveInternal() check
prevents false alarms
+ * from systems where getppid() returns a debugger PID
while being
+ * traced.
+ */
set->report_postmaster_not_running = true;
+ }

The fast getppid() check will prevent the slow and redundant
PostmasterIsAliveInternal() check from being reached on production
systems, until the postmaster really is gone in the race scenario
described.

(Note that all of this per-lock-wait work will go away with
https://commitfest.postgresql.org/27/2452/, so I'm glad Alexander
found this now).

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2020-03-16 21:41:51 pgsql: Update comment
Previous Message Alvaro Herrera 2020-03-16 20:30:40 Re: pgsql: Add kqueue(2) support to the WaitEventSet API.

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2020-03-16 21:25:11 Re: Berserk Autovacuum (let's save next Mandrill)
Previous Message Bruce Momjian 2020-03-16 21:08:58 Re: Just for fun: Postgres 20?