Re: Weird failure with latches in curculio on v15

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Weird failure with latches in curculio on v15
Date: 2023-02-03 05:35:48
Message-ID: 20230203053548.GA27055@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 02, 2023 at 02:39:19PM -0800, Nathan Bossart wrote:
> Maybe we could just
> remove this exit-in-SIGTERM-handler business...

I've spent some time testing this. It seems to work pretty well, but only
if I keep the exit-on-SIGTERM logic in shell_restore(). Without that, I'm
seeing delayed shutdowns, which I assume means
HandleStartupProcInterrupts() isn't getting called (I'm still investigating
this). Іn any case, the fact that shell_restore() exits if the command
fails due to SIGTERM seems like an implementation detail that we won't
necessarily want to rely on once recovery modules are available. In short,
we seem to depend on the SIGTERM handling in RestoreArchivedFile() in order
to be responsive to shutdown requests.

One idea I have is to approximate the current behavior by simply checking
for the shutdown_requested flag before before and after executing
restore_command. This seems to work as desired even if the exit-on-SIGTERM
logic is removed from shell_restore(). Unless there is some reason to
break out of system() (versus just waiting for the command to fail after it
receives SIGTERM), I think this approach should suffice.

I've attached a draft patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
adjust_sigterm_handling.patch text/x-diff 3.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-02-03 05:37:53 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Amit Kapila 2023-02-03 05:20:56 Re: Time delayed LR (WAS Re: logical replication restrictions)