Re: Logical Replication of sequences

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2025-11-02 19:05:04
Message-ID: CALDaNm0kGfCS5OY7x=JzzAMbOcwh+ykpGtV8JOfN4gG131gAUw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 31 Oct 2025 at 11:26, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Vignesh,
>
> For later.... here are some review comments for the documentation
> patch v20251030-0002
>
> ======
> doc/src/sgml/config.sgml
>
> wal_retrieve_retry_interval:
>
> 1.
> <para>
> - In logical replication, this parameter also limits how often a failing
> - replication apply worker or table synchronization worker will be
> - respawned.
> + In logical replication, this parameter also limits how quickly a
> + failing replication apply worker, table synchronization worker, or
> + sequence synchronization worker will be respawned.
> </para>
>
> I think you can simplify that.
>
> SUGGESTION
> In logical replication, this parameter also limits how quickly a
> failing replication apply worker, or table/sequence synchronization
> worker will be respawned.

Modified

> ~~~
>
> max_logical_replication_workers:
>
> 2.
> <para>
> Specifies maximum number of logical replication workers. This includes
> - leader apply workers, parallel apply workers, and table synchronization
> - workers.
> + leader apply workers, parallel apply workers, table synchronization
> + workers and a sequence synchronization worker.
> </para>
>
> I think you can simplify that.
>
> SUGGESTION
> This includes leader apply workers, parallel apply workers, and
> table/sequence synchronization workers.

Modified

> ~~~
>
> max_sync_workers_per_subscription:
>
> 3.
> <para>
> Maximum number of synchronization workers per subscription. This
> parameter controls the amount of parallelism of the initial data copy
> - during the subscription initialization or when new tables are added.
> + during the subscription initialization or when new tables or sequences
> + are added.
> </para>
>
> But, there is no parallelism at all for sequence copies, because there
> is only one sequencesync worker (as the following docs paragraph
> says), so maybe we do not need this docs change.
>
> NOTE -- see the comment #12 below, and maybe use wording like that.

Modified similarly

> ======
> doc/src/sgml/logical-replication.sgml
>
> Section 29.1 Publication:
>
> 4.
> Publications may currently only contain tables or sequences. Objects must be
> added explicitly, except when a publication is created using
> <literal>FOR TABLES IN SCHEMA</literal>, <literal>FOR ALL TABLES</literal>,
> - or <literal>FOR ALL SEQUENCES</literal>.
> + or <literal>FOR ALL SEQUENCES</literal>. Unlike tables, the state of
> + sequences can be synchronized at any time. For more information, see
> + <xref linkend="logical-replication-sequences"/>.
>
> Not sure about the wording "the state of". Maybe it can be simplified?
>
> SUGGESTION
> Unlike tables, sequences can be synchronized at any time.

Modified

> ~~~
>
> 5.
> + <listitem>
> + <para>
> + use <link linkend="sql-altersubscription-params-refresh-sequences">
> + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>
> + to re-synchronize all sequences.
> + </para>
> + </listitem>
>
> AFAIK it's not going to get any newly added sequences so it is not
> really "all sequences" so this seems misleading. I thought it should
> be like below.
>
> SUGGESTION
> use ALTER SUBSCRIPTION ... REFRESH SEQUENCES to re-synchronize all
> sequences currently known to the subscription.

Modified

> ~~~
>
> Section 29.7.1. Sequence Definition Mismatches:
>
> 6.
> + <para>
> + During sequence synchronization, the sequence definitions of the publisher
> + and the subscriber are compared. An error is logged listing all differing
> + sequences before the process exits. The apply worker detects this failure
> + and repeatedly respawns the sequence synchronization worker to retry until
> + all differences are resolved. See also
> + <link linkend="guc-wal-retrieve-retry-interval"><varname>wal_retrieve_retry_interval</varname></link>.
> + </para>
>
> It seems a bit misleading. e.g. AFAIK the "The apply worker detects
> this failure" is not true. IIUC, the apply worker simply finds some
> sequences that still have INIT state, so really it has no knowledge of
> failure at all, right?
>
> Consider rewording this part.
>
> SUGGESTION
> The sequence synchronization worker validates that sequence
> definitions match between publisher and subscriber. If mismatches
> exist, the worker logs an error identifying them and exits. The apply
> worker continues respawning the sequence synchronization worker until
> synchronization succeeds.

Modified

> ~~~
>
> Section 29.7.2. Refreshing Stale Sequences:
>
> 7.
> + <para>
> + Subscriber side sequence values may frequently become out of sync due to
> + updates on the publisher.
> + </para>
> + <para>
> + To verify, compare the sequence values between the publisher and
> + subscriber, and if necessary, execute
> + <link linkend="sql-altersubscription-params-refresh-sequences">
> + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>.
> + </para>
>
> I didn't see why the wording "To verify" was needed here. Below is a
> slightly simpler alternative for these paragraphs.
>
> SUGGESTION
> Subscriber sequence values drift out of sync as the publisher advances
> them. Compare values between publisher and subscriber, then run ALTER
> SUBSCRIPTION ... REFRESH SEQUENCES to resynchronize if necessary.

Modified

> ~~~
>
> Section 29.7.3. Examples.
>
> 8. GENERAL. Prompts in examples
>
> I think using prompts like "test_pub#" in the examples is frowned upon
> because it makes cutting directly from the examples more difficult.
> Similarly, the result of the commands is not shown.
>
> See other PG18 logical replication examples for why the current style
> is.... e.g. more like this:
> /* pub # */ CREATE TABLE t1(a int, b int, c text, PRIMARY KEY(a,c));
> /* pub # */ CREATE TABLE t2(d int, e int, f int, PRIMARY KEY(d));

Modified

> ~~~
>
> 9.
> + Re-synchronize all the sequences on the subscriber using
> + <link linkend="sql-altersubscription-params-refresh-sequences">
> + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>.
>
> SUGGESTION
> Re-synchronize all sequences known to the subscriber using...

Modified

> ~~~
>
> Section 29.9. Restrictions #
>
> 10.
> + then this should typically not be a problem. If, however, some kind of
> + switchover or failover to the subscriber database is intended, then the
> + sequences would need to be updated to the latest values, either by
> + executing <link linkend="sql-altersubscription-params-refresh-sequences">
> + <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>
> + or by copying the current data from the publisher (perhaps using
> + <command>pg_dump</command>) or by determining a sufficiently high value
> + from the tables themselves.
>
> IIUC the "ALTER SUBSCRIPTION ... REFRESH SEQUENCES" is only going to
> resync the sequences that the subscriber already knew about. So, if
> you really wanted to get the latest of everything won't the user need
> to execute double-commands just in case there are some new sequences
> at the publisher?
>
> e.g.
> First, ALTER SUBSCRIPTION REFRESH PUBLICATION
> Then, ALTER SUBSCRIPTION REFRESH SEQUENCES

I felt we don't support DDL, so new one's created should be copied
using pg_dump. I felt existing is ok.

> ~~~
>
> Section 29.13.2. Subscribers #
>
> 11.
> - workers), plus some reserve for the table synchronization workers and
> - parallel apply workers.
> + workers), plus some reserve for the parallel apply workers, table
> synchronization workers, and a sequence
> + synchronization worker.
>
> I think this can be worded similar to the config.sgml
>
> SUGGESTION
> ... plus some reserve for the parallel apply workers, and
> table/sequence synchronization workers.

Modified

> ~~
>
> 12.
> <para>
> <link linkend="guc-max-sync-workers-per-subscription"><varname>max_sync_workers_per_subscription</varname></link>
> - controls the amount of parallelism of the initial data copy during the
> - subscription initialization or when new tables are added.
> + controls how many tables can be synchronized in parallel during
> + subscription initialization or when new tables are added. One additional
> + worker is also needed for sequence synchronization.
> </para>
>
> Oh, perhaps this is the wording that should have been used in
> config.sgml (review comment #3) to avoid implying about sequencesync
> workers helping with parallelism.

Used the sequence synchronization doc similar to here in #3

> ======
> doc/src/sgml/monitoring.sgml
>
> 13.
> <para>
> Type of the subscription worker process. Possible types are
> - <literal>apply</literal>, <literal>parallel apply</literal>, and
> - <literal>table synchronization</literal>.
> + <literal>apply</literal>, <literal>parallel apply</literal>,
> + <literal>table synchronization</literal>, and
> + <literal>sequence synchronization</literal>.
> </para></entry>
>
> This docs fragment probably belongs with the pg_stats patch, not here.

This is ok here as we display this for running process, the other
stats patch is mainly for errors.

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 14.
> (see <xref linkend="sql-createtype"/> for more about send/receive
> - functions).
> + functions). This parameter is not applicable for sequences.
> </para>
>
> In many places for this page the patch says "This parameter is not
> applicable for sequences."
>
> IMO that is ambiguous. It is not clear if the parameter is silently
> ignored, if it will give an error, or what?
>
> Maybe you can clarify by saying "is ignored" or "has no effect",
> instead of "is not applicable"

Modified

The attached v20251102 version patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v20251102-0001-New-worker-for-sequence-synchronization-du.patch text/x-patch 67.0 KB
v20251102-0002-Documentation-for-sequence-synchronization.patch text/x-patch 20.5 KB
v20251102-0003-Add-seq_sync_error_count-to-subscription-s.patch text/x-patch 21.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-11-02 20:07:29 Re: Improve LWLock tranche name visibility across backends
Previous Message Tom Lane 2025-11-02 18:11:49 Re: [PATCH] Add a guc parameter to control limit clause adjust path cost.