|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||"Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>|
|Cc:||fabriziomello(at)gmail(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, "[pgdg] Robert Haas" <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Minimal logical decoding on standbys|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
> While working on this I found a, somewhat substantial, issue:
> When the primary is idle, on the standby logical decoding via walsender
> will typically not process the records until further WAL writes come in
> from the primary, or until a 10s lapsed.
> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
> increase, but gets woken up by walreceiver when new WAL has been
> flushed. Which means that typically walsenders will get woken up at the
> same time that the startup process will be - which means that by the
> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
> that the startup process already replayed the record and updated
> I think fixing this would require too invasive changes at this point. I
> think we might be able to live with 10s delay issue for one release, but
> it sure is ugly :(.
This is indeed pretty painful. It's a lot more regularly occuring if you
either have a slot disk, or you switch around the order of
WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
- There's about which timeline to use. If you use pg_recvlogical and you
restart the server, you'll see errors like:
pg_recvlogical: error: unexpected termination of replication stream: ERROR: requested WAL segment 000000000000000000000003 has already been removed
the real filename is 000000010000000000000003 - i.e. the timeline is
This isn't too hard to fix, but definitely needs fixing.
- ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
leading us to drop a slot that has been created since we signalled a
recovery conflict. See
for some very similar issues.
- Given the precedent of max_slot_wal_keep_size, I think it's wrong to
just drop the logical slots. Instead we should just mark them as
invalid, like InvalidateObsoleteReplicationSlots().
- There's no tests covering timeline switches, what happens if there's a
promotion if logical decoding is currently ongoing.
- The way ResolveRecoveryConflictWithLogicalSlots() builds the error
message is not good (and I've complained about it before...).
Unfortunately I think the things I have found are too many for me to
address within the given time. I'll send a version with a somewhat
polished set of the changes I made in the next few days...
|Next Message||Bharath Rupireddy||2021-04-08 04:12:23||Re: Set access strategy for parallel vacuum workers|
|Previous Message||Amit Langote||2021-04-08 03:32:43||Re: ModifyTable overheads in generic plans|