RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-04-01 00:56:29
Message-ID: OS0PR01MB57168C0BF24F48DD125E02F0943F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, March 29, 2024 2:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> >
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments.
> >
>
> The other approach to fix this issue could be that the slotsync worker get the
> serialized snapshot using pg_read_binary_file() corresponding to restart_lsn
> and writes those at standby. But there are cases when we won't have such a file
> like (a) when we initially create the slot and reach the consistent_point, or (b)
> also by the time the slotsync worker starts to read the remote snapshot file, the
> snapshot file could have been removed by the checkpointer on the primary (if
> the restart_lsn of the remote has been advanced in this window). So, in such
> cases, we anyway need to advance the slot. I think these could be optimizations
> that we could do in the future.
>
> Few comments:

Thanks for the comments.

> =============
> 1.
> - if (slot->data.database != MyDatabaseId)
> + if (slot->data.database != MyDatabaseId && !fast_forward)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("replication slot \"%s\" was not created in this database", @@ -526,7
> +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> * Do not allow consumption of a "synchronized" slot until the standby
> * gets promoted.
> */
> - if (RecoveryInProgress() && slot->data.synced)
> + if (RecoveryInProgress() && slot->data.synced &&
> + !IsSyncingReplicationSlots())
>
>
> Add comments at both of the above places.

Added.

>
>
> 2.
> +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> + bool *found_consistent_point);
> +
>
> This API looks a bit awkward as the functionality doesn't match the name. How
> about having a function with name
> LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> ready_for_decoding) with the same functionality as your patch has for
> pg_logical_replication_slot_advance() and then invoke it both from
> pg_logical_replication_slot_advance and slotsync.c. The function name is too
> big, we can think of a shorter name. Any ideas?

How about LogicalSlotAdvanceAndCheckDecodingState() Or just
LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be renamed in
next version if we agree).

Attach the V3 patch which addressed above comments and Kuroda-san's
comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
after syncing, as the restart_lsn could be different from the remote one due to
the new slot_advance() call. I am also testing some optimization idea locally
and will share if ready.

[1] https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Attachment Content-Type Size
v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch application/octet-stream 19.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-04-01 01:01:05 Re: Streaming I/O, vectored I/O (WIP)
Previous Message Tony Wayne 2024-04-01 00:47:33 Cant link libpq with a pg extension using cmake