Re: Logical Replication of sequences

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2025-11-01 05:15:41
Message-ID: CALDaNm0dHW9b7iZ5XwstGLptmf=fNwgj8vAkC9HgJzz0Saw2HQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 31 Oct 2025 at 07:34, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Vignesh,
>
> Some review comments for v20251030-0001
>
> ======
> Commit message
>
> 1.
> 3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES
> - (A new command introduced in PG19 by a prior patch.)
> - All sequences in pg_subscription_rel are reset to DATASYNC state.
> ~
>
> DATASYNC? Just above, it says the only possible states for sequences
> are INIT and READY.

Modified

> ======
> .../replication/logical/sequencesync.c
>
> 2.
> + * A single sequencesync worker is responsible for synchronizing all sequences
> + * in INIT state in pg_subscription_rel. It begins by retrieving the list of
> + * sequences flagged for synchronization. These sequences are then processed
> + * in batches, allowing multiple entries to be synchronized within a single
> + * transaction. The worker fetches the current sequence values and page LSNs
> + * from the remote publisher, updates the corresponding sequences on the local
> + * subscriber, and finally marks each sequence as READY upon successful
> + * synchronization.
> + *
>
> Those first 2 sentences seem repetitive because AFAIK "in INIT state"
> and "flagged for synchronization" are exactly the same thing.
>
> SUGGESTION
> A single sequencesync worker is responsible for synchronizing all
> sequences. It begins by retrieving the list of sequences that are
> flagged for needing synchronization (e.g. those with INIT state).

Modified with slight changes

> ~~~
>
> get_and_validate_seq_info:
>
> 3.
> +/*
> + * get_and_validate_seq_info
> + *
> + * Extracts remote sequence information from the tuple slot received from the
> + * publisher and validates it against the corresponding local sequence
> + * definition.
> + */
>
> Missing comma.
>
> /publisher and validates/publisher, and validates/

Modified

> ~~~
>
> 4.
> Now that this function is in 2 parts, I think each part should be
> clearly identified with comments, something like:
>
> PART 1:
> /*
> * 1. Extract sequence information from the tuple slot received from the
> * publisher
> */
>
> PART 2:
> /*
> * 2. Compare the remote sequence definition to the local sequence definition,
> * and report any discrepancies.
> */

I felt this is not required

> ~~~
>
> 5.
> + seqinfo_local = (LogicalRepSequenceInfo *) list_nth(seqinfos, seqidx);
> + seqinfo_local->found_on_pub = true;
> + *seqinfo = seqinfo_local;
>
> Is this separate `seqinfo_local` variable needed? It seems always
> unconditionally assigned to the parameter, so you might as well just
> do without the extra variable. Maybe just rename the parameter as
> `seqinfo_local`?

I preferred the existing as it is more readable

> ~~~
>
> 6.
> + if (!*sequence_rel) /* Sequence was concurrently dropped? */
> + return COPYSEQ_SKIPPED;
> +
> + tup = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seqinfo_local->localrelid));
> + if (!HeapTupleIsValid(tup)) /* Sequence was concurrently dropped? */
> + return COPYSEQ_SKIPPED;
>
> Nit. IMO the code was easier to read in the previous patch version,
> when it was commented above the code like:
>
> /* Sequence was concurrently dropped */
> if (!*sequence_rel)
> return COPYSEQ_SKIPPED;
>
> /* Sequence was concurrently dropped */
> tup = SearchSysCache1(SEQRELID, ObjectIdGetDatum(seqinfo->localrelid));
> if (!HeapTupleIsValid(tup))
> return COPYSEQ_SKIPPED;

Updated

> ~~~
>
> copy_sequence:
>
> 7.
> +static CopySeqResult
> +copy_sequence(LogicalRepSequenceInfo *seqinfo, int64 last_value,
> + bool is_called, XLogRecPtr page_lsn, Oid seqowner)
> +{
> + UserContext ucxt;
> + AclResult aclresult;
> + bool run_as_owner = MySubscription->runasowner;
> + Oid seqoid = seqinfo->localrelid;
> +
> + /*
> + * Make sure that the sequence is copied as sequence owner, unless the
> + * user has opted out of that behaviour.
> + */
> + if (!run_as_owner)
> + SwitchToUntrustedUser(seqowner, &ucxt);
>
> The code/comment seems contradictory because IMO it is vague what
> runasowner member means -- which owner? AFAIK `run_as_owner` really
> means "run as *subscription* owner". Ideally, the
> MySubscription->runasowner member might be renamed to 'runassubowner'
> but maybe that is a bigger change than you want to make for this
> patch.
>
> So, maybe just the comment can be rewritten for more clarity.
>
> SUGGESTION:
> If the user did not opt to run as the owner of the subscription
> ('run_as_owner'), then copy the sequence as the owner of the sequence.
>
> (Also, make the similar comment change for the equivalent place in tablesync.c).

Updated

> ~~~
>
> copy_sequences:
>
> 8.
> + ereport(LOG,
> + errmsg("logical replication sequence synchronization for
> subscription \"%s\" - total unsynchronized: %d",
> + MySubscription->name, list_length(seqinfos)));
> +
> + while (cur_batch_base_index < list_length(seqinfos))
>
> Would it be tidier to declare int n_seqinfos = list_length(seqinfos);
> instead of using list_length() multiple times in this function.

Modified

> ~~~
>
> 9.
> + * We deliberately avoid acquiring a local lock on the sequence before
> + * querying the publisher to prevent potential distributed deadlocks
> + * in bi-directional replication setups. For instance, a concurrent
> + * ALTER SEQUENCE on one node might block this worker, while the
> + * worker's own local lock simultaneously blocks a similar operation
> + * on the other nod resulting in a circular wait that spans both nodes
> + * and remains undetected.
>
> typo: /nod/node/

Modified

> ~~~
>
> 10.
> + int64 last_value;
> + bool is_called;
> + XLogRecPtr page_lsn;
> + CopySeqResult sync_status;
> + LogicalRepSequenceInfo *seqinfo;
> +
> + CHECK_FOR_INTERRUPTS();
> +
> + if (ConfigReloadPending)
> + {
> + ConfigReloadPending = false;
> + ProcessConfigFile(PGC_SIGHUP);
> + }
> +
> + sync_status = get_and_validate_seq_info(slot, &sequence_rel,
> + &seqinfo, &page_lsn,
> + &last_value, &is_called);
> + if (sync_status == COPYSEQ_SUCCESS)
> + sync_status = copy_sequence(seqinfo, last_value, is_called,
> + page_lsn,
> + sequence_rel->rd_rel->relowner);
>
> Why does LogicalRepSequenceInfo only have to be metadata? Can't those
> `page_lsn`, `last_value`, and `is_called` also be made members of
> LogicalRepSequenceInfo, so you don't have to declare them and pass
> them all in and out as parameters here?

Modified

> ======
> src/test/subscription/t/036_sequences.pl
>
> 11.
> +$node_publisher->safe_psql(
> + 'postgres', qq(
> + DROP SEQUENCE regress_s4;
> +));
> +
> +# Verify that an error is logged for the missing sequence ('regress_s4').
> +$node_subscriber->wait_for_log(
> + qr/ERROR: ( [A-Z0-9]+:)? logical replication sequence
> synchronization failed for subscription "regress_seq_sub"\n.*DETAIL:.*
> Missing sequence\(s\) on publisher: \("public.regress_s4"\)/,
> + $log_offset);
>
> I felt that psql DROP code belongs below the comment too.
> Alternatively, add another comment for that DROP, like:
> # Drop the sequence ('regress_s4') in preparation for the next test

Modified

> ~~~
>
> 12.
> There is still a yet-to-be-implemented test combination as previously
> reported [1, comment #3], right?

I'm skipping this to keep minimal tests.

Apart from these, the following improvements were done:
a) The error message has been improved to split into 3 warning
messages a) missing sequences b) mismatch sequences and insufficient
privileges sequences and finally an error message as the original was
too long and difficult to map each of the hints.
b) errmsg_plural and errhint_plural were used for error messages to
differentiate based on sequence/sequences based on sequence count.
c) Changed the batch log message to debug1 level
d) Few of the comments were improved to add more clarity.

The attached v20251101 version patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v20251101-0003-Add-seq_sync_error_count-to-subscription-s.patch application/octet-stream 21.5 KB
v20251101-0001-New-worker-for-sequence-synchronization-du.patch application/octet-stream 67.0 KB
v20251101-0002-Documentation-for-sequence-synchronization.patch application/octet-stream 20.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-11-01 05:21:35 Re: meson's in-tree libpq header search order vs -Dextra_include_dirs
Previous Message Bryan Green 2025-11-01 04:40:58 Re: [PATCH] Add Windows support for backtrace_functions (MSVC only)