Re: Allow interrupts on waiting standby

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow interrupts on waiting standby
Date: 2017-04-03 05:53:41
Message-ID: CAB7nPqSMu1BAPgHK2aqmYm=7f__VzwvXDCVHqYLZzbH1HFbE-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 4:46 PM, Tsunakawa, Takayuki
<tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally? I understood you added it for startup process to respond quickly to events other than the postmaster death. Why don't we restore WL_LATCH_SET? I won't object to not adding the flag if there's a reason.

It doesn't matter. MyLatch is never set in the context of the startup
process. The other code paths of the startup process using WaitLatch()
may be awaken by the latch set by the WAL receiver, but that does not
apply here.

> I'll mark this as ready for committer when I see WL_LATCH_SET added (optional)

Objection on this point.

> and you have reported that you did the following test cases:
> * Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be resolved.
> * Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be resolved.
> * Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1 to a short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be resolved.

Fine for me, I have checked those multiple scenarios and the startup
process is more responsive. I have emulated the conflict with a
transaction doing repeatable read on the standby while the master was
deleting and vacuuming a table.

But, actually, after looking again at this patch with fresher eyes, I
am being a bit doubtful that this is really correct... Calling
HandleStartupProcInterrupts() is actually dangerous I think, because
it would reload the GUC context while replaying a record. But I think
that it is cleaner to reload the context after being done with a
record.

It seems to me that the correct way to do things would be to switch
all the conflict code paths to use a latch instead of pg_usleep, by
either use a dedicated latch like the recovery wakeup one, or the
MyLatch of the startup process. Then, when a signal arrives in, we
simply set up the conflict latch which wakes up what is waiting for
activity. The latch needs to be reset because calling WaitLatch().
Leaving immediately on WL_POSTMASTER_DEATH looks fine though as far as
I have tested.

As time is growing short, I am marking this patch as returned with
feedback. Thanks for the input, Tsunakawa-san!
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-04-03 05:57:24 Re: REINDEX CONCURRENTLY 2.0
Previous Message Craig Ringer 2017-04-03 05:46:08 Re: Logical decoding on standby