| From: | Peter Smith <smithpb2250(at)gmail(dot)com> | 
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> | 
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org> | 
| Subject: | Re: Logical Replication of sequences | 
| Date: | 2025-10-31 05:56:31 | 
| Message-ID: | CAHut+Ps4+hdUECoKsw7DSSY-S=b4p4CMVJO3cJWQYm5Zy0RpsA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi Vignesh,
For later.... here are some review comments for the documentation
patch v20251030-0002
======
doc/src/sgml/config.sgml
wal_retrieve_retry_interval:
1.
        <para>
-        In logical replication, this parameter also limits how often a failing
-        replication apply worker or table synchronization worker will be
-        respawned.
+        In logical replication, this parameter also limits how quickly a
+        failing replication apply worker, table synchronization worker, or
+        sequence synchronization worker will be respawned.
        </para>
I think you can simplify that.
SUGGESTION
In logical replication, this parameter also limits how quickly a
failing replication apply worker, or table/sequence synchronization
worker will be respawned.
~~~
max_logical_replication_workers:
2.
        <para>
         Specifies maximum number of logical replication workers. This includes
-        leader apply workers, parallel apply workers, and table synchronization
-        workers.
+        leader apply workers, parallel apply workers, table synchronization
+        workers and a sequence synchronization worker.
        </para>
I think you can simplify that.
SUGGESTION
This includes leader apply workers, parallel apply workers, and
table/sequence synchronization workers.
~~~
max_sync_workers_per_subscription:
3.
        <para>
         Maximum number of synchronization workers per subscription. This
         parameter controls the amount of parallelism of the initial data copy
-        during the subscription initialization or when new tables are added.
+        during the subscription initialization or when new tables or sequences
+        are added.
        </para>
But, there is no parallelism at all for sequence copies, because there
is only one sequencesync worker (as the following docs paragraph
says), so maybe we do not need this docs change.
NOTE -- see the comment #12 below, and maybe use wording like that.
======
doc/src/sgml/logical-replication.sgml
Section 29.1 Publication:
4.
    Publications may currently only contain tables or sequences. Objects must be
    added explicitly, except when a publication is created using
    <literal>FOR TABLES IN SCHEMA</literal>, <literal>FOR ALL TABLES</literal>,
-   or <literal>FOR ALL SEQUENCES</literal>.
+   or <literal>FOR ALL SEQUENCES</literal>. Unlike tables, the state of
+   sequences can be synchronized at any time. For more information, see
+   <xref linkend="logical-replication-sequences"/>.
Not sure about the wording "the state of". Maybe it can be simplified?
SUGGESTION
Unlike tables, sequences can be synchronized at any time.
~~~
5.
+    <listitem>
+     <para>
+      use <link linkend="sql-altersubscription-params-refresh-sequences">
+      <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>
+      to re-synchronize all sequences.
+     </para>
+    </listitem>
AFAIK it's not going to get any newly added sequences so it is not
really "all sequences" so this seems misleading. I thought it should
be like below.
SUGGESTION
use ALTER SUBSCRIPTION ... REFRESH SEQUENCES to re-synchronize all
sequences currently known to the subscription.
~~~
Section 29.7.1. Sequence Definition Mismatches:
6.
+   <para>
+    During sequence synchronization, the sequence definitions of the publisher
+    and the subscriber are compared. An error is logged listing all differing
+    sequences before the process exits. The apply worker detects this failure
+    and repeatedly respawns the sequence synchronization worker to retry until
+    all differences are resolved. See also
+    <link linkend="guc-wal-retrieve-retry-interval"><varname>wal_retrieve_retry_interval</varname></link>.
+   </para>
It seems a bit misleading. e.g. AFAIK the "The apply worker detects
this failure" is not true. IIUC, the apply worker simply finds some
sequences that still have INIT state, so really it has no knowledge of
failure at all, right?
Consider rewording this part.
SUGGESTION
The sequence synchronization worker validates that sequence
definitions match between publisher and subscriber. If mismatches
exist, the worker logs an error identifying them and exits. The apply
worker continues respawning the sequence synchronization worker until
synchronization succeeds.
~~~
Section 29.7.2. Refreshing Stale Sequences:
7.
+   <para>
+    Subscriber side sequence values may frequently become out of sync due to
+    updates on the publisher.
+   </para>
+   <para>
+    To verify, compare the sequence values between the publisher and
+    subscriber, and if necessary, execute
+    <link linkend="sql-altersubscription-params-refresh-sequences">
+    <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>.
+   </para>
I didn't see why the wording "To verify" was needed here. Below is a
slightly simpler alternative for these paragraphs.
SUGGESTION
Subscriber sequence values drift out of sync as the publisher advances
them.  Compare values between publisher and subscriber, then run ALTER
SUBSCRIPTION ... REFRESH SEQUENCES to resynchronize if necessary.
~~~
Section 29.7.3. Examples.
8. GENERAL. Prompts in examples
I think using prompts like "test_pub#" in the examples is frowned upon
because it makes cutting directly from the examples more difficult.
Similarly, the result of the commands is not shown.
See other PG18 logical replication examples for why the current style
is.... e.g. more like this:
/* pub # */ CREATE TABLE t1(a int, b int, c text, PRIMARY KEY(a,c));
/* pub # */ CREATE TABLE t2(d int, e int, f int, PRIMARY KEY(d));
~~~
9.
+    Re-synchronize all the sequences on the subscriber using
+    <link linkend="sql-altersubscription-params-refresh-sequences">
+    <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>.
SUGGESTION
Re-synchronize all sequences known to the subscriber using...
~~~
Section 29.9. Restrictions #
10.
+     then this should typically not be a problem.  If, however, some kind of
+     switchover or failover to the subscriber database is intended, then the
+     sequences would need to be updated to the latest values, either by
+     executing <link linkend="sql-altersubscription-params-refresh-sequences">
+     <command>ALTER SUBSCRIPTION ... REFRESH SEQUENCES</command></link>
+     or by copying the current data from the publisher (perhaps using
+     <command>pg_dump</command>) or by determining a sufficiently high value
+     from the tables themselves.
IIUC the "ALTER SUBSCRIPTION ... REFRESH SEQUENCES" is only going to
resync the sequences that the subscriber already knew about. So, if
you really wanted to get the latest of everything won't the user need
to execute double-commands just in case there are some new sequences
at the publisher?
e.g.
First, ALTER SUBSCRIPTION REFRESH PUBLICATION
Then, ALTER SUBSCRIPTION REFRESH SEQUENCES
~~~
Section 29.13.2. Subscribers #
11.
-    workers), plus some reserve for the table synchronization workers and
-    parallel apply workers.
+    workers), plus some reserve for the parallel apply workers, table
synchronization workers, and a sequence
+    synchronization worker.
I think this can be worded similar to the config.sgml
SUGGESTION
... plus some reserve for the parallel apply workers, and
table/sequence synchronization workers.
~~
12.
    <para>
     <link linkend="guc-max-sync-workers-per-subscription"><varname>max_sync_workers_per_subscription</varname></link>
-     controls the amount of parallelism of the initial data copy during the
-     subscription initialization or when new tables are added.
+     controls how many tables can be synchronized in parallel during
+     subscription initialization or when new tables are added. One additional
+     worker is also needed for sequence synchronization.
    </para>
Oh, perhaps this is the wording that should have been used in
config.sgml (review comment #3) to avoid implying about sequencesync
workers helping with parallelism.
======
doc/src/sgml/monitoring.sgml
13.
       <para>
        Type of the subscription worker process.  Possible types are
-       <literal>apply</literal>, <literal>parallel apply</literal>, and
-       <literal>table synchronization</literal>.
+       <literal>apply</literal>, <literal>parallel apply</literal>,
+       <literal>table synchronization</literal>, and
+       <literal>sequence synchronization</literal>.
       </para></entry>
This docs fragment probably belongs with the pg_stats patch, not here.
======
doc/src/sgml/ref/create_subscription.sgml
14.
           (see <xref linkend="sql-createtype"/> for more about send/receive
-          functions).
+          functions). This parameter is not applicable for sequences.
          </para>
In many places for this page the patch says "This parameter is not
applicable for sequences."
IMO that is ambiguous. It is not clear if the parameter is silently
ignored, if it will give an error, or what?
Maybe you can clarify by saying "is ignored" or "has no effect",
instead of "is not applicable"
======
Kind Regards,
Peter Smith.
Fujitsu Australia
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hayato Kuroda (Fujitsu) | 2025-10-31 05:59:59 | RE: How can end users know the cause of LR slot sync delays? | 
| Previous Message | shveta malik | 2025-10-31 05:51:16 | Re: Improve pg_sync_replication_slots() to wait for primary to advance |