Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, <fabriziomello(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, "[pgdg] Robert Haas" <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2021-07-27 07:23:48
Message-ID: fa90206a-6bf1-2121-cf0c-15b0734a2731@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 7/19/21 12:13 PM, Ibrar Ahmed wrote:

> On Fri, Jul 16, 2021 at 1:07 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com
> <mailto:bdrouvot(at)amazon(dot)com>> wrote:
>
> Hi Andres,
>
> On 6/22/21 12:38 PM, Drouvot, Bertrand wrote:
> > Hi Andres,
> >
> > On 6/14/21 7:41 AM, Drouvot, Bertrand wrote:
> >> Hi Andres,
> >>
> >> On 4/8/21 5:47 AM, Andres Freund wrote:
> >>> Hi,
> >>>
> >>> On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
> >>>> While working on this I found a, somewhat substantial, issue:
> >>>>
> >>>> When the primary is idle, on the standby logical decoding via
> >>>> walsender
> >>>> will typically not process the records until further WAL writes
> >>>> come in
> >>>> from the primary, or until a 10s lapsed.
> >>>>
> >>>> The problem is that WalSndWaitForWal() waits for the *replay*
> LSN to
> >>>> increase, but gets woken up by walreceiver when new WAL has been
> >>>> flushed. Which means that typically walsenders will get woken
> up at
> >>>> the
> >>>> same time that the startup process will be - which means that
> by the
> >>>> time the logical walsender checks GetXLogReplayRecPtr() it's
> unlikely
> >>>> that the startup process already replayed the record and updated
> >>>> XLogCtl->lastReplayedEndRecPtr.
> >>>>
> >>>> I think fixing this would require too invasive changes at this
> >>>> point. I
> >>>> think we might be able to live with 10s delay issue for one
> >>>> release, but
> >>>> it sure is ugly :(.
> >>> This is indeed pretty painful. It's a lot more regularly
> occuring if
> >>> you
> >>> either have a slot disk, or you switch around the order of
> >>> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
> >>>
> >>> - There's about which timeline to use. If you use
> pg_recvlogical and
> >>> you
> >>>    restart the server, you'll see errors like:
> >>>
> >>>    pg_recvlogical: error: unexpected termination of replication
> >>> stream: ERROR:  requested WAL segment 000000000000000000000003
> has
> >>> already been removed
> >>>
> >>>    the real filename is 000000010000000000000003 - i.e. the
> timeline is
> >>>    0.
> >>>
> >>>    This isn't too hard to fix, but definitely needs fixing.
> >>
> >> Thanks, nice catch!
> >>
> >> From what I have seen, we are not going through
> InitXLOGAccess() on a
> >> Standby and in some cases (like the one you mentioned)
> >> StartLogicalReplication() is called without IdentifySystem() being
> >> called previously: this lead to ThisTimeLineID still set to 0.
> >>
> >> I am proposing a fix in the attached v18 by adding a check in
> >> StartLogicalReplication() and ensuring that ThisTimeLineID is
> retrieved.
> >>
> >>>
> >>> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
> >>>    leading us to drop a slot that has been created since we
> signalled a
> >>>    recovery conflict.  See
> >>>
> https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de
> <https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de>
>
> >>>
> >>>    for some very similar issues.
> >>
> >> I have rewritten this part by following the same logic as the one
> >> used in 96540f80f8 (the commit linked to the thread you mentioned).
> >>
> >>>
> >>> - Given the precedent of max_slot_wal_keep_size, I think it's
> wrong to
> >>>    just drop the logical slots. Instead we should just mark
> them as
> >>>    invalid, like InvalidateObsoleteReplicationSlots().
> >>
> >> Makes fully sense and done that way in the attached patch.
> >>
> >> I am setting the slot's data.xmin and data.catalog_xmin as
> >> InvalidTransactionId to mark the slot(s) as invalid in case of
> conflict.
> >>
> >>> - There's no tests covering timeline switches, what happens if
> >>> there's a
> >>>    promotion if logical decoding is currently ongoing.
> >>
> >> I'll now work on the tests.
> >>
> >>>
> >>> - The way ResolveRecoveryConflictWithLogicalSlots() builds the
> error
> >>>    message is not good (and I've complained about it before...).
> >>
> >> I changed it and made it more simple.
> >>
> >> I also removed the details around mentioning xmin or catalog
> xmin (as
> >> I am not sure of the added value and they are currently also not
> >> mentioned during standby recovery snapshot conflict).
> >>
> >>>
> >>> Unfortunately I think the things I have found are too many for
> me to
> >>> address within the given time. I'll send a version with a somewhat
> >>> polished set of the changes I made in the next few days...
> >>
> >> Thanks for the review and feedback.
> >>
> >> Please find enclosed v18 with the changes I worked on.
> >>
> >> I still need to have a look on the tests.
> >
> > Please find enclosed v19 that also contains the changes related to
> > your TAP tests remarks, mainly:
> >
> > - get rid of 024 and add more tests in 026 (025 has been used in
> the
> > meantime)
> >
> > - test that logical decoding actually produces useful and
> correct results
> >
> > - test standby promotion and logical decoding behavior once done
> >
> > - useless "use" removal
> >
> > - check_confl_logicalslot() function removal
> >
> > - rewrote make_slot_active() to make use of poll_query_until() and
> > timeout
> >
> > - remove the useless eval()
> >
> > - remove the "Catalog xmins should advance after standby logical
> slot
> > fetches the changes" test
> >
> > One thing that's not clear to me is your remark "There's also no
> test
> > for a recovery conflict due to row removal": Don't you think
> that the
> > "vacuum full" conflict test is enough? if not, what kind of
> additional
> > tests would you like to see?
> >
> >>
> >> There is also the 10s delay to work on, do you already have an
> idea
> >> on how we should handle it?
> >>
> >> Thanks
> >>
> >> Bertrand
> >>
> > Thanks
> >
> > Bertrand
> >
> Please find enclosed v20 a needed rebase (nothing serious worth
> mentioning) of v19.
>
> FWIW, just to sum up that v19 (and so v20):
>
> - contained the changes (see details above) related to your TAP tests
> remarks
>
> - contained the changes (see details above) related to your code
> remarks
>
> There is still the 10s delay thing that need work: do you already
> have
> an idea on how we should handle it?
>
> And still one thing that's not clear to me is your remark "There's
> also
> no test for a recovery conflict due to row removal": Don't you think
> that the "vacuum full" conflict test is enough? if not, what kind of
> additional tests would you like to see?
>
> Thanks
>
> Bertrand
>
>
>
>
>
> The patch does not apply and an updated patch is required.
> patching file src/include/replication/slot.h
> Hunk #1 FAILED at 214.
> 1 out of 2 hunks FAILED -- saving rejects to file src/include/replication/slot.h.rej
>
Thanks for the warning, rebase done and new v21 version attached.

Bertrand

Attachment Content-Type Size
v21-0005-Doc-changes-describing-details-about-logical-dec.patch text/plain 2.1 KB
v21-0004-New-TAP-test-for-logical-decoding-on-standby.patch text/plain 16.9 KB
v21-0003-Allow-logical-decoding-on-standby.patch text/plain 17.2 KB
v21-0002-Handle-logical-slot-conflicts-on-standby.patch text/plain 29.3 KB
v21-0001-Add-info-in-WAL-records-in-preparation-for-logic.patch text/plain 18.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2021-07-27 07:25:22 Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
Previous Message Ajin Cherian 2021-07-27 06:59:14 Re: Failed transaction statistics to measure the logical replication progress