Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>, Euler Taveira <euler(at)eulerto(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-04-11 15:09:43
Message-ID: CAPpHfdsRK0j14qE2BnYAw7xJtmg0nsYZ+Ut4kyNogNQX4dMi=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Heikki!

Thank you for your interest in the subject.

On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 07/04/2024 00:52, Alexander Korotkov wrote:
> > On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >> I'm still concerned that WaitLSNCleanup is only called in ProcKill.
> >> Does this mean that if a process throws an error while waiting, it'll
> >> not get cleaned up until it exits? Maybe this is not a big deal, but it
> >> seems odd.
> >
> > I've added WaitLSNCleanup() to the AbortTransaction(). Just pushed
> > that together with the improvements upthread.
>
> Race condition:
>
> 1. backend: pg_wal_replay_wait('0/1234') is called. It calls WaitForLSN
> 2. backend: WaitForLSN calls addLSNWaiter('0/1234'). It adds the backend
> process to the LSN heap and returns
> 3. replay: rm_redo record '0/1234'
> 4. backend: WaitForLSN enters for-loop, calls GetXLogReplayRecPtr()
> 5. backend: current replay LSN location is '0/1234', so we exit the loop
> 6. replay: calls WaitLSNSetLatches()
>
> In a nutshell, it's possible for the loop in WaitForLSN to exit without
> cleaning up the process from the heap. I was able to hit that by adding
> a delay after the addLSNWaiter() call:
>
> > TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> > postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> > postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> > postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> > postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
>
> I think there's a similar race condition if the timeout is reached at
> the same time that the startup process wakes up the process.

Thank you for catching this. I think WaitForLSN() just needs to call
deleteLSNWaiter() unconditionally after exit from the loop.

> > * At first, we check that pg_wal_replay_wait() is called in a non-atomic
> > * context. That is, a procedure call isn't wrapped into a transaction,
> > * another procedure call, or a function call.
> > *
>
> It's pretty unfortunate to have all these restrictions. It would be nice
> to do:
>
> select pg_wal_replay_wait('0/1234'); select * from foo;

This works for me, except that it needs "call" not "select".

# call pg_wal_replay_wait('0/1234'); select * from foo;
CALL
i
---
(0 rows)

> in a single multi-query call, to avoid the round-trip to the client. You
> can avoid it with libpq or protocol level pipelining, too, but it's more
> complicated.
>
> > * Secondly, according to PlannedStmtRequiresSnapshot(), even in an atomic
> > * context, CallStmt is processed with a snapshot. Thankfully, we can pop
> > * this snapshot, because PortalRunUtility() can tolerate this.
>
> This assumption that PortalRunUtility() can tolerate us popping the
> snapshot sounds very fishy. I haven't looked at what's going on there,
> but doesn't sound like a great assumption.

This is what PortalRunUtility() says about this.

/*
* Some utility commands (e.g., VACUUM) pop the ActiveSnapshot stack from
* under us, so don't complain if it's now empty. Otherwise, our snapshot
* should be the top one; pop it. Note that this could be a different
* snapshot from the one we made above; see EnsurePortalSnapshotExists.
*/

So, if the vacuum pops a snapshot when it needs to run without a
snapshot, then it's probably OK for other utilities. But I agree this
decision needs some consensus.

> If recovery ends while a process is waiting for an LSN to arrive, does
> it ever get woken up?

If the recovery target is promote, then the user will get an error.
If the recovery target is shutdown, then connection will get
interrupted. If the recovery target is pause, then waiting will
continue during the pause. Not sure about the latter case.

> The docs could use some-copy-editing, but just to point out one issue:
>
> > There are also procedures to control the progress of recovery.
>
> That's copy-pasted from an earlier sentence at the table that lists
> functions like pg_promote(), pg_wal_replay_pause(), and
> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control the
> progress of recovery like those functions do, it only causes the calling
> backend to wait.
>
> Overall, this feature doesn't feel quite ready for v17, and IMHO should
> be reverted. It's a nice feature, so I'd love to have it fixed and
> reviewed early in the v18 cycle.

Thank you for your review. I've reverted this. Will repost this for early v18.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2024-04-11 15:20:56 Re: Allow non-superuser to cancel superuser tasks.
Previous Message Nathan Bossart 2024-04-11 14:58:34 Re: allow changing autovacuum_max_workers without restarting