Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>, Euler Taveira <euler(at)eulerto(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-04-10 22:46:04
Message-ID: b155606b-e744-4218-bda5-29379779da1a@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/04/2024 00:52, Alexander Korotkov wrote:
> On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> I'm still concerned that WaitLSNCleanup is only called in ProcKill.
>> Does this mean that if a process throws an error while waiting, it'll
>> not get cleaned up until it exits? Maybe this is not a big deal, but it
>> seems odd.
>
> I've added WaitLSNCleanup() to the AbortTransaction(). Just pushed
> that together with the improvements upthread.

Race condition:

1. backend: pg_wal_replay_wait('0/1234') is called. It calls WaitForLSN
2. backend: WaitForLSN calls addLSNWaiter('0/1234'). It adds the backend
process to the LSN heap and returns
3. replay: rm_redo record '0/1234'
4. backend: WaitForLSN enters for-loop, calls GetXLogReplayRecPtr()
5. backend: current replay LSN location is '0/1234', so we exit the loop
6. replay: calls WaitLSNSetLatches()

In a nutshell, it's possible for the loop in WaitForLSN to exit without
cleaning up the process from the heap. I was able to hit that by adding
a delay after the addLSNWaiter() call:

> TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]

I think there's a similar race condition if the timeout is reached at
the same time that the startup process wakes up the process.

> * At first, we check that pg_wal_replay_wait() is called in a non-atomic
> * context. That is, a procedure call isn't wrapped into a transaction,
> * another procedure call, or a function call.
> *

It's pretty unfortunate to have all these restrictions. It would be nice
to do:

select pg_wal_replay_wait('0/1234'); select * from foo;

in a single multi-query call, to avoid the round-trip to the client. You
can avoid it with libpq or protocol level pipelining, too, but it's more
complicated.

> * Secondly, according to PlannedStmtRequiresSnapshot(), even in an atomic
> * context, CallStmt is processed with a snapshot. Thankfully, we can pop
> * this snapshot, because PortalRunUtility() can tolerate this.

This assumption that PortalRunUtility() can tolerate us popping the
snapshot sounds very fishy. I haven't looked at what's going on there,
but doesn't sound like a great assumption.

If recovery ends while a process is waiting for an LSN to arrive, does
it ever get woken up?

The docs could use some-copy-editing, but just to point out one issue:

> There are also procedures to control the progress of recovery.

That's copy-pasted from an earlier sentence at the table that lists
functions like pg_promote(), pg_wal_replay_pause(), and
pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control the
progress of recovery like those functions do, it only causes the calling
backend to wait.

Overall, this feature doesn't feel quite ready for v17, and IMHO should
be reverted. It's a nice feature, so I'd love to have it fixed and
reviewed early in the v18 cycle.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-04-10 23:03:08 Re: Improve eviction algorithm in ReorderBuffer
Previous Message Michael Paquier 2024-04-10 22:43:24 Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?