Re: Minimal logical decoding on standbys

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: fabriziomello(at)gmail(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, "[pgdg] Robert Haas" <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2021-04-08 03:47:13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
> While working on this I found a, somewhat substantial, issue:
> When the primary is idle, on the standby logical decoding via walsender
> will typically not process the records until further WAL writes come in
> from the primary, or until a 10s lapsed.
> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
> increase, but gets woken up by walreceiver when new WAL has been
> flushed. Which means that typically walsenders will get woken up at the
> same time that the startup process will be - which means that by the
> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
> that the startup process already replayed the record and updated
> XLogCtl->lastReplayedEndRecPtr.
> I think fixing this would require too invasive changes at this point. I
> think we might be able to live with 10s delay issue for one release, but
> it sure is ugly :(.

This is indeed pretty painful. It's a lot more regularly occuring if you
either have a slot disk, or you switch around the order of
WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().

- There's about which timeline to use. If you use pg_recvlogical and you
restart the server, you'll see errors like:

pg_recvlogical: error: unexpected termination of replication stream: ERROR: requested WAL segment 000000000000000000000003 has already been removed

the real filename is 000000010000000000000003 - i.e. the timeline is

This isn't too hard to fix, but definitely needs fixing.

- ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
leading us to drop a slot that has been created since we signalled a
recovery conflict. See
for some very similar issues.

- Given the precedent of max_slot_wal_keep_size, I think it's wrong to
just drop the logical slots. Instead we should just mark them as
invalid, like InvalidateObsoleteReplicationSlots().

- There's no tests covering timeline switches, what happens if there's a
promotion if logical decoding is currently ongoing.

- The way ResolveRecoveryConflictWithLogicalSlots() builds the error
message is not good (and I've complained about it before...).

Unfortunately I think the things I have found are too many for me to
address within the given time. I'll send a version with a somewhat
polished set of the changes I made in the next few days...


Andres Freund

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-08 04:12:23 Re: Set access strategy for parallel vacuum workers
Previous Message Amit Langote 2021-04-08 03:32:43 Re: ModifyTable overheads in generic plans