Re: Logical Replication of sequences

From: vignesh C <vignesh21(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2025-07-14 04:33:25
Message-ID: CALDaNm0QCp8ZP2n5rn3c4uB6YSpGQvWAxRTF9JYqk7hFjD9yPw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 11 Jul 2025 at 14:26, shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jul 9, 2025 at 4:11 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> >
> > > 3)
> > > SyncFetchRelationStates:
> > > Earlier the name was FetchTableStates. If we really want to use the
> > > 'Sync' keyword, we can name it FetchRelationSyncStates, as we are
> > > fetching sync-status only. Thoughts?
> >
> > Instead of FetchRelationSyncStates, I preferred FetchRelationStates,
> > and changed it to FetchRelationStates.
> >
>
> Okay, LGTM.
>
> > > 5)
> > > + if (!MyLogicalRepWorker->sequencesync_failure_time ||
> > > + TimestampDifferenceExceeds(MyLogicalRepWorker->sequencesync_failure_time,
> > > + now, wal_retrieve_retry_interval))
> > > + {
> > > + MyLogicalRepWorker->sequencesync_failure_time = 0;
> > > +
> > > + logicalrep_worker_launch(WORKERTYPE_SEQUENCESYNC,
> > > + MyLogicalRepWorker->dbid,
> > > + MySubscription->oid,
> > > + MySubscription->name,
> > > + MyLogicalRepWorker->userid,
> > > + InvalidOid,
> > > + DSM_HANDLE_INVALID);
> > > + break;
> > > + }
> > >
> > > We set sequencesync_failure_time to 0, but if logicalrep_worker_launch
> > > is not able to launch the worker due to some reason, next time it will
> > > not even wait for 'wal_retrieve_retry_interval time' to attempt
> > > restarting it again. Is that intentional?
> > >
> > > In other workflows such as while launching table-sync or apply worker,
> > > this scenario does not arise. This is because we maintain start_time
> > > there which can never be 0 instead of failure time and before
> > > attempting to start the workers, we set start_time to current time.
> > > The seq-sync failure-time OTOH is only set to non-null in
> > > logicalrep_seqsyncworker_failure() and it is not necessary that we
> > > will hit that function as the logicalrep_worker_launch() may fail
> > > before that itself. Do you think we shall maintain start-time instead
> > > of failure-time for seq-sync worker as well? Or is there any other way
> > > to handle it?
> >
> > I preferred the suggestion from [1]. Modified it accordingly.
>
> Okay, works for me.
>
> > The attached v20250709 version patch has the changes for the same.
> >
>
> Thank You for the patches. Please find a few comments:
>
> 1)
> Shall we update pg_publication doc as well to indicate that pubinsert,
> pubupdate, pubdelete , pubtruncate, pubviaroot are meaningful only
> when publishing tables. For sequences, these have no meaning.

Since it is clearly mentioned it is for tables, I felt no need to
mention again it is not applicable for sequences.

> 2)
> Shall we have walrcv_disconnect() after copy is done in
> LogicalRepSyncSequences()

There is a cleanup function registered for the worker to handle this
at the worker exit. So this is not required.

> 3)
> Do we really need for-loop in ProcessSyncingSequencesForApply? I think
> this function is inspired from ProcessSyncingTablesForApply() but
> there we need different tablesync workers for different tables. For
> sequences, that is not the case and thus for-loop can be omitted. If
> we do so, we can amend the comments too where it says " Walk over all
> subscription sequences....."

Handled in the previous version

> 4)
> +# Confirm that the warning for parameters differing is logged.
> +$node_subscriber->wait_for_log(
>
> We can drop regress_seq_sub on the publisher now and check for missing
> warnings as the next step.

Modified

> 5)
> I am revisiting the test given in [1], I see there is some document change as:
>
> + Incremental sequence changes are not replicated. Although the data in
> + serial or identity columns backed by sequences will be replicated as part
> + of the table, the sequences themselves do not replicate ongoing changes.
> + On the subscriber, a sequence will retain the last value it synchronized
> + from the publisher. If the subscriber is used as a read-only database,
> + then this should typically not be a problem. If, however, some kind of
> + switchover or failover to the subscriber database is intended, then the
> + sequences would need to be updated to the latest values, either by
> + executing <link
> linkend="sql-altersubscription-params-refresh-publication-sequences">
> + <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION
> SEQUENCES</command></link>
> + or by copying the current data from the publisher (perhaps using
> + <command>pg_dump</command>) or by determining a sufficiently high value
> + from the tables themselves.
>
> But this doc specifically mentions a failover case. It does not
> mention the case presented in [1] i.e. if user is trying to use
> sequence to populate identity column of a "subscribed" table where the
> sequence is also synced originally from publisher, then he may end up
> with corrupted state
> of IDENTITY column, and thus such cases should be used with caution.
> Please review once and see if we need to mention this and the example
> too.

In this case, the identity column data—as well as the non-identity
columns—will be sent by the publisher as part of the row data. This
behavior appears consistent with how non-sequence objects are handled
in a publication.
The following documentation note should be sufficient, as it already
clarifies that "it will retain the last value it synchronized from the
publisher":
On the subscriber, a sequence will retain the last value it
synchronized from the publisher. If the subscriber is used as a
read-only database, this should typically not pose a problem.
But if you have something in mind which should be added let me know.

The attached patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v20250714-0001-Introduce-pg_sequence_state-function-for-e.patch application/octet-stream 7.3 KB
v20250714-0004-Introduce-REFRESH-PUBLICATION-SEQUENCES-fo.patch application/octet-stream 43.1 KB
v20250714-0003-Reorganize-tablesync-Code-and-Introduce-sy.patch application/octet-stream 23.0 KB
v20250714-0005-New-worker-for-sequence-synchronization-du.patch application/octet-stream 73.3 KB
v20250714-0002-Introduce-ALL-SEQUENCES-support-for-Postgr.patch application/octet-stream 104.8 KB
v20250714-0006-Documentation-for-sequence-synchronization.patch application/octet-stream 33.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2025-07-14 04:52:48 Re: track needed attributes in plan nodes for executor use
Previous Message wenhui qiu 2025-07-14 03:42:03 Add last_(auto)vacuum_duration column to pg_stat_all_tables