Re: Synchronizing slots from primary to standby

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Synchronizing slots from primary to standby
Date: 2022-03-09 13:01:41
Message-ID: CAE9k0PmzDkC+A3Ex2Qmw3fA3XMHONws8q75i1a+nr90ZDybTAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I have spent little time trying to understand the concern raised by
Andres and while doing so I could think of a couple of issues which I
would like to share here. Although I'm not quite sure how inline these
are with the problems seen by Andres.

1) Firstly, what if we come across a situation where the failover
occurs when the confirmed flush lsn has been updated on primary, but
is yet to be updated on the standby? I believe this may very well be
the case especially considering that standby sends sql queries to the
primary to synchronize the replication slots at regular intervals and
if the primary dies just after updating the confirmed flush lsn of its
logical subscribers then the standby may not be able to get this
information/update from the primary which means we'll probably end up
having a broken logical replication slot on the new primary.

2) Secondly, if the standby goes down, the logical subscribers will
stop receiving new changes from the primary as per the design of this
patch OR if standby lags behind the primary for whatever reason, it
will have a direct impact on logical subscribers as well.

--
With Regards,
Ashutosh Sharma.

On Sat, Feb 19, 2022 at 3:53 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> > On 05.02.22 20:59, Andres Freund wrote:
> > > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > > > From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > > From: Peter Eisentraut<peter(at)eisentraut(dot)org>
> > > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> > > > standby
> > > I've just skimmed the patch and the related threads. As far as I can tell this
> > > cannot be safely used without the conflict handling in [1], is that correct?
> >
> > This or similar questions have been asked a few times about this or similar
> > patches, but they always come with some doubt.
>
> I'm certain it's a problem - the only reason I couched it was that there could
> have been something clever in the patch preventing problems that I missed
> because I just skimmed it.
>
>
> > If we think so, it would be
> > useful perhaps if we could come up with test cases that would demonstrate
> > why that other patch/feature is necessary. (I'm not questioning it
> > personally, I'm just throwing out ideas here.)
>
> The patch as-is just breaks one of the fundamental guarantees necessary for
> logical decoding, that no rows versions can be removed that are still required
> for logical decoding (signalled via catalog_xmin). So there needs to be an
> explicit mechanism upholding that guarantee, but there is not right now from
> what I can see.
>
> One piece of the referenced patchset is that it adds information about removed
> catalog rows to a few WAL records, and then verifies during replay that no
> record can be replayed that removes resources that are still needed. If such a
> conflict exists it's dealt with as a recovery conflict.
>
> That itself doesn't provide prevention against removal of required, but it
> provides detection. The prevention against removal can then be done using a
> physical replication slot with hot standby feedback or some other mechanism
> (e.g. slot syncing mechanism could maintain a "placeholder" slot on the
> primary for all sync targets or something like that).
>
> Even if that infrastructure existed / was merged, the slot sync stuff would
> still need some very careful logic to protect against problems due to
> concurrent WAL replay and "synchronized slot" creation. But that's doable.
>
> Greetings,
>
> Andres Freund
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-09 13:02:25 Re: role self-revocation
Previous Message Peter Eisentraut 2022-03-09 12:55:01 Re: role self-revocation