Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-04-03 03:33:54
Message-ID: CAA4eK1K_mmq1p6T97kg=ANhD-Jki1qO_Cqy1d6m1AoU3PRrTuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 2, 2024 at 7:42 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > > 1. Can we just remove pg_logical_replication_slot_advance and use
> > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> > > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> > > pg_logical_replication_slot_advance?
> > >
> > > + * Advance our logical replication slot forward. See
> > > + * LogicalSlotAdvanceAndCheckSnapState for details.
> > > */
> > > static XLogRecPtr
> > > pg_logical_replication_slot_advance(XLogRecPtr moveto)
> > > {
> >
> > It was commented[1] that it's not appropriate for the
> > pg_logical_replication_slot_advance to have an out parameter
> > 'ready_for_decoding' which looks bit awkward as the functionality doesn't match
> > the name, and is also not consistent with the style of
> > pg_physical_replication_slot_advance(). So, we added a new function.
>
> I disagree here. A new function just for a parameter is not that great
> IMHO.
>

It is not for the parameter but primarily for the functionality it
provides. The additional functionality of whether we reached a
consistent point while advancing the slot doesn't sound to suit the
current function. Also, we want to keep the signature similar to the
existing function pg_physical_replication_slot_advance().

>
> I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> moveto, bool *found_consistent_snapshot) to
> pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
> *found_consistent_snapshot) and use it. If others don't like this, I'd
> at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a
> static inline function.
>

Yeah, we can do that but it is not a performance-sensitive routine so
don't know if it is worth it.

> > > 5. As far as the test case for this issue is concerned, I'm fine with
> > > adding one using an INJECTION point because we seem to be having no
> > > consistent way to control postgres writing current snapshot to WAL.
> >
> > Since me and my colleagues can reproduce the issue consistently after applying
> > 0002 and it could be rare for concurrent xl_running_xacts to happen, we discussed[2] to
> > consider adding the INJECTION point after pushing the main fix.
>
> Right.
>
> > > 7.
> > > + /*
> > > + * We need to access the system tables during decoding to build the
> > > + * logical changes unless we are in fast-forward mode where no changes
> > > are
> > > + * generated.
> > > + */
> > > + if (slot->data.database != MyDatabaseId && !fast_forward)
> > >
> > > May I know if we need this change for this fix?
> >
> > The slotsync worker needs to advance the slots from different databases in
> > fast_forward. So, we need to skip this check in fast_forward mode. The analysis can
> > be found in [3].
> - if (slot->data.database != MyDatabaseId)
> + /*
> + * We need to access the system tables during decoding to build the
> + * logical changes unless we are in fast_forward mode where no changes are
> + * generated.
> + */
> + if (slot->data.database != MyDatabaseId && !fast_forward)
> ereport(ERROR,
>
> It's not clear from the comment that we need it for a slotsync worker
> to advance the slots from different databases. Can this be put into
> the comment? Also, specify in the comment, why this is safe?
>

It is not specific to slot sync worker but specific to fast_forward
mode. There is already a comment "We need to access the system tables
during decoding to build the logical changes unless we are in
fast_forward mode where no changes are generated." telling why it is
safe. The point is we need database access to access system tables
while generating the logical changes and in fast-forward mode, we
don't generate logical changes so this check is not required. Do let
me if you have a different understanding or if my understanding is
incorrect.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-04-03 03:35:43 Re: archive modules loose ends
Previous Message jian he 2024-04-03 03:30:08 Re: remaining sql/json patches