Re: Logical Replication of sequences

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2025-10-23 06:15:17
Message-ID: CALDaNm0eQ3Uy0O97QEUB_r=whGjTnfmZFMKnm70QxKr+mAK-hg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 22 Oct 2025 at 10:36, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 21, 2025 at 8:11 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Tue, 21 Oct 2025 at 03:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > ---
> > > /*
> > > - * Check and log a warning if the publisher has subscribed to the same table,
> > > - * its partition ancestors (if it's a partition), or its partition children (if
> > > - * it's a partitioned table), from some other publishers. This check is
> > > - * required in the following scenarios:
> > > + * Check and log a warning if the publisher has subscribed to the same relation
> > > + * (table or sequence), its partition ancestors (if it's a partition), or its
> > > + * partition children (if it's a partitioned table), from some other
> > > publishers.
> > > + * This check is required in the following scenarios:
> > > *
> > > * 1) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION
> > > * statements with "copy_data = true" and "origin = none":
> > > * - Warn the user that data with an origin might have been copied.
> > > - * - This check is skipped for tables already added, as incremental sync via
> > > - * WAL allows origin tracking. The list of such tables is in
> > > - * subrel_local_oids.
> > > + * - This check is skipped for tables and sequences already added, as
> > > + * incremental sync via WAL allows origin tracking. The list of
> > > such tables
> > > + * is in subrel_local_oids.
> > > *
> > > * 2) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION
> > > * statements with "retain_dead_tuples = true" and "origin = any", and for
> > > @@ -2338,13 +2440,19 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
> > > * - Warn the user that only conflict detection info for local changes on
> > > * the publisher is retained. Data from other origins may lack sufficient
> > > * details for reliable conflict detection.
> > > + * - This check targets for tables only.
> > > * - See comments atop worker.c for more details.
> > > + *
> > > + * 3) For ALTER SUBSCRIPTION ... REFRESH SEQUENCE statements with "origin =
> > > + * none":
> > > + * - Warn the user that sequence data from another origin might have been
> > > + * copied.
> > > */
> > >
> > > While this function is well documented, I find it's quite complex, and
> > > this patch adds to that complexity. The function has 9 arguments,
> > > making it difficult to understand which combinations of arguments
> > > enable which checks. For example, the function header comment doesn't
> > > explain when to use the only_sequences parameter. At first, I thought
> > > only_sequences should be set to true when checking if the publisher
> > > has subscribed to sequences from other publishers, but looking at the
> > > code, I discovered it doesn't check sequences when check_rdt is true:
> > >
> > > + if (walrcv_server_version(wrconn) < 190000 || check_rdt)
> > > + appendStringInfo(&cmd, query,
> > > + "(SELECT relid, TRUE as istable FROM
> > > pg_get_publication_tables(P.pubname))");
> > > + else if (only_sequences)
> > > + appendStringInfo(&cmd, query,
> > > + "(SELECT relid, FALSE as istable FROM
> > > pg_get_publication_sequences(P.pubname))");
> > > + else
> > > + appendStringInfo(&cmd, query,
> > > + "(SELECT relid, TRUE as istable FROM
> > > pg_get_publication_tables(P.pubname) UNION ALL"
> > > + " SELECT relid, FALSE as istable FROM
> > > pg_get_publication_sequences(P.pubname))");
> > > +
> > >
> > > I find that the complexity might stem from checking different cases in
> > > one function, but I don't have better ideas to improve the logic for
> > > now. I think we can at least describe what the caller can expect from
> > > specifying only_sequence to true.
> >
> > Split this function into check_publications_origin_sequences and
> > check_publications_origin_tables to reduce the complexity. After this
> > change we log two warnings if both tables and sequences are subscriber
> > to the same tables and sequences like:
> >
>
> I think the case where both WARNINGs will be displayed is rare so it
> should be okay as it simplifies the code quite a bit. Another thing is
> we need to query twice but as this happens during DDL and only for
> very specific cases that should also be okay. We can anyway merge
> these later if we see any problem with it but for now it would be
> better to prefer code simplicity.

I agree.

> When check_publications_origin_sequences() is called from Alter
> Subscription ... Refresh Publication ... or Create Subscription ...
> code path then shouldn't we check copy_data as well along with origin
> as none? Because, if copy_data is false, we should have added a
> sequence in the READY state, so we don't need to fetch its values.

Fixed this.

> I have added a few comments in this new function and made a number of
> other cosmetic improvements in the attached.

Thanks for the changes, I merged it.

The attached patch has the changes for the same.
I have also addressed the other comments: a) Shveta's comments at [1]
b) Peter's comments at [2] & [3] c) Shveta's 2nd patch comments at [4]
and d) Chao's comment#12 from [5] which was pending.

[1] - https://www.postgresql.org/message-id/CAJpy0uDesLXjpDiDs6fA8HMr419D2YrXb7tA10e9Bp%2BuCypZ_Q%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAHut%2BPtp%2BFMHgS-kaKZwWEoNeyomecUDGECvoCpfx4_eCUDyUA%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAHut%2BPvx945dJGhMtf2Rv5p8Xn4xQke65MfO-UwK3cRPnsXFRQ%40mail.gmail.com
[4] - https://www.postgresql.org/message-id/CAJpy0uDxms8ynrHWXHGFuxDLA7QzDLLASqpqdWnD%3DLa8UJPt7Q%40mail.gmail.com
[5] - https://www.postgresql.org/message-id/598FC353-8E9A-4857-A125-740BE24DCBEB%40gmail.com

Regards,
Vignesh

Attachment Content-Type Size
v20251023-0001-Introduce-REFRESH-SEQUENCES-for-subscripti.patch application/octet-stream 55.3 KB
v20251023-0002-New-worker-for-sequence-synchronization-du.patch application/octet-stream 89.3 KB
v20251023-0003-Documentation-for-sequence-synchronization.patch application/octet-stream 21.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2025-10-23 07:00:00 Re: IO in wrong state on riscv64
Previous Message Michael Paquier 2025-10-23 05:12:24 Re: Making pg_rewind faster