stopgap fix for signal handling during restore_command

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: stopgap fix for signal handling during restore_command
Date: 2023-02-23 23:15:03
Message-ID: 20230223231503.GA743455@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote:
> On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> I'm happy to create a new thread if needed, but I can't tell if there is
>> any interest in this stopgap/back-branch fix. Perhaps we should just jump
>> straight to the long-term fix that Thomas is looking into.
>
> Unfortunately the latch-friendly subprocess module proposal I was
> talking about would be for 17. I may post a thread fairly soon with
> design ideas + list of problems and decision points as I see them, and
> hopefully some sketch code, but it won't be a proposal for [/me checks
> calendar] next week's commitfest and probably wouldn't be appropriate
> in a final commitfest anyway, and I also have some other existing
> stuff to clear first. So please do continue with the stopgap ideas.

Okay, here is a new thread...

Since v8.4, the startup process will proc_exit() immediately within its
SIGTERM handler while the restore_command executes via system(). Some
recent changes added unsafe code to the section where this behavior is
enabled [0]. The long-term fix likely includes moving away from system()
completely, but we may want to have a stopgap/back-branch fix while that is
under development.

I've attached a patch set for a proposed stopgap fix. 0001 simply moves
the extra code outside of the Pre/PostRestoreCommand() block so that only
system() is executed while the SIGTERM handler might proc_exit(). This
restores the behavior that was in place from v8.4 to v14, so I don't expect
it to be too controversial. 0002 adds code to startup's SIGTERM handler to
call _exit() instead of proc_exit() if we are in a forked process from
system(), etc. It also adds assertions to ensure proc_exit(), ProcKill(),
and AuxiliaryProcKill() are not called within such forked processes.

Thoughts?

[0] https://postgr.es/m/20230201105514.rsjl4bnhb65giyvo%40alap3.anarazel.de

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v6-0001-Move-extra-code-out-of-the-Pre-PostRestoreCommand.patch text/x-diff 2.0 KB
v6-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-f.patch text/x-diff 2.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-02-23 23:16:50 Re: Weird failure with latches in curculio on v15
Previous Message Andrey Chudnovsky 2023-02-23 23:04:22 Re: [PoC] Federated Authn/z with OAUTHBEARER