Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-11-10 10:44:27
Message-ID: CAExHW5sMM0E04mXujKSaz6C0f_bgN1hpfA2HmXSF4z0jn1=aiA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 10, 2025 at 3:01 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
>
>
> 2)
> ProcessSlotSyncAPIInterrupts
> slotsync_api_reread_config
> -- These also have API in it, but I do not have any better name
> suggestions here, we can retain the current ones and see what others
> say.

ProcessSlotSyncInterrupts() handles shutdown waiting,
ProcessSlotSyncAPIInterrupts doesn't. Why is this difference? It will
be good to explain why we need two different functions for worker and
SQL function and also explain the difference between them.

$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");

I think, the intention behind dropping the slot is to be able to
create it again for the next test. But there is no comment here
explaining that. There is also no comment explaining why we are
dropping both slots here; when the test only needs dropping one.
That's going to create confusion. One might think that all the slots
need to be dropped at this stage, and drop and create any future slots
that are used by prior code, for example. At the end of this test, we
recreate the slot using pg_create_logical_replication_slot(), which is
different method of creating slot than this test does. Though I can
understand the reason, it's not apparent. Generally reusing slot names
across multiple tests (in this file) is a source of confusion. But at
least for the test you are adding, you could use a different slot name
to avoid confusion.
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");

If the standby does not receive the WAL corresponding to the DDL
before this function is executed, the slot will get synchronized
immediately. I think we have to make sure that the standby has
received the DDL before executing this function.

Also most of the code which uses query_until has pattern like this:
$h->query_until(qr/start/, q{\echo start
SQL command});
But we expect an empty string here. Why this difference?

I think we need a similar test to test promotion while the function is
waiting for the slot to become sync-ready.

SyncReplicationSlots() and the main loop in ReplSlotSyncWorkerMain()
are similar with some functional differences. Some part of their code
needs to be kept in sync in future. How do we achieve that? At least
we need a comment saying so in each of those patches and keep those
two codes in proximity.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-11-10 10:52:15 Re: Logical Replication of sequences
Previous Message BharatDB 2025-11-10 10:41:21 Re: Non-blocking archiver process