Re: Physical replication slot advance is not persistent

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Physical replication slot advance is not persistent
Date: 2020-01-28 09:45:47
Message-ID: CAMsr+YGkLEJQFUVO+OPdwyP7nQosu7fwOZ2XzH7oKw4OG2f+Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Jan 2020 at 16:01, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Jan 21, 2020 at 02:07:30PM +0900, Michael Paquier wrote:
> > On Mon, Jan 20, 2020 at 11:00:14AM -0800, Andres Freund wrote:
> >> Hm. I'm think testing this with real LSNs is a better idea. What if the
> >> node actually already is past FF/FFFFFFFF at this point? Quite unlikely,
> >> I know, but still. I.e. why not get the current LSN after the INSERT,
> >> and assert that the slot, after the restart, is that?
> >
> > Sure. If not disabling autovacuum in the test, we'd just need to make
> > sure if that advancing is at least ahead of the INSERT position.
>
> Actually, as the advancing happens only up to this position we just
> need to make sure that the LSN reported by the slot is the same as the
> position advanced to. I have switched the test to just do that
> instead of using a fake LSN.
>
> > Anyway, I am still not sure if we should got down the road to just
> > mark the slot as dirty if any advancing is done and let the follow-up
> > checkpoint to the work, if the advancing function should do the slot
> > flush, or if we choose one and make the other an optional choice (not
> > for back-branches, obviously. Based on my reading of the code, my
> > guess is that a flush should happen at the end of the advancing
> > function.
>
> I have been chewing on this point for a couple of days, and as we may
> actually crash between the moment the slot is marked as dirty and the
> moment the slot information is made consistent, we still have a risk
> to have the slot go backwards even if the slot information is saved.
> The window is much narrower, but well, the docs of logical decoding
> mention that this risk exists. And the patch becomes much more
> simple without changing the actual behavior present since the feature
> has been introduced for logical slots. There could be a point in
> having a new option to flush the slot information, or actually a
> separate function to flush the slot information, but let's keep that
> for a future possibility.
>
> So attached is an updated patch which addresses the problem just by
> marking a physical slot as dirty if any advancing is done. Some
> documentation is added about the fact that an advance is persistent
> only at the follow-up checkpoint. And the tests are fixed to not use
> a fake LSN but instead advance to the latest LSN position produced.
>
> Any objections?

LGTM. Thankyou.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-01-28 09:47:08 Re: [HACKERS] Block level parallel vacuum
Previous Message Kyotaro Horiguchi 2020-01-28 09:12:07 Re: standby apply lag on inactive servers