Re: Weird failure with latches in curculio on v15

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Weird failure with latches in curculio on v15
Date: 2023-02-03 18:54:17
Message-ID: 20230203185417.GA243728@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 02, 2023 at 11:44:22PM -0800, Andres Freund wrote:
> On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>> > A workaround for the back branches could be to have a test in
>> > StartupProcShutdownHandler() that tests if MyProcPid == getpid(), and
>> > not do the proc_exit() if they don't match. We probably should just do
>> > an _exit() in that case.
>>
>> Might work.
>
> I wonder if we should add code complaining loudly about such a mismatch
> to proc_exit(), in addition to handling it more silently in
> StartupProcShutdownHandler(). Also, an assertion in
> [Auxiliary]ProcKill that proc->xid == MyProcPid == getpid() seems like a
> good idea.

From the discussion, it sounds like we don't want to depend on the child
process receiving/handling the signal, so we can't get rid of the
break-out-of-system() behavior (at least not in back-branches). I've put
together some work-in-progress patches for the stopgap/back-branch fix.

0001 is just v1-0001 from upthread. This moves Pre/PostRestoreCommand to
surround only the call to system(). I think this should get us closer to
pre-v15 behavior.

0002 adds the getpid() check mentioned above to
StartupProcShutdownHandler(), and it adds assertions to proc_exit() and
[Auxiliary]ProcKill().

0003 adds checks for shutdown requests before and after the call to
shell_restore(). IMO the Pre/PostRestoreCommand stuff is an implementation
detail for restore_command, so I think it behooves us to have some
additional shutdown checks that apply even for recovery modules. This
patch could probably be moved to the recovery modules thread.

Is this somewhat close to what folks had in mind?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v4-0001-stopgap-fix-for-restore_command.patch text/x-diff 2.8 KB
v4-0002-do-not-call-proc_exit-in-process-forked-for-resto.patch text/x-diff 2.4 KB
v4-0003-handle-shutdown-requests-before-and-after-restori.patch text/x-diff 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2023-02-03 19:27:23 Re: proposal: psql: psql variable BACKEND_PID
Previous Message Nitin Jadhav 2023-02-03 18:48:04 Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl