Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-07-09 04:53:40
Message-ID: CAJpy0uAOgNKATvyywJw2R3zRT9v2UDKu1ctQ03S8C03ws0-8OA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please find few more comments:

1)
In pg_sync_replication_slots() doc, we have this:

"Note that this function is primarily intended for testing and
debugging purposes and should be used with caution. Additionally, this
function cannot be executed if ...."

We can get rid of this info as well and change to:

"Note that this function cannot be executed if...."

2)
We got rid of NOTE in logicaldecoding.sgml, but now the page does not
mention pg_sync_replication_slots() at all. We need to bring back the
change removed by [1] (or something on similar line) which is this:

- <command>CREATE SUBSCRIPTION</command> during slot creation, and
then calling
- <link linkend="pg-sync-replication-slots">
- <function>pg_sync_replication_slots</function></link>
- on the standby. By setting <link linkend="guc-sync-replication-slots">
+ <command>CREATE SUBSCRIPTION</command> during slot creation.
+ Additionally, enabling <link linkend="guc-sync-replication-slots">
+ <varname>sync_replication_slots</varname></link> on the standby
+ is required. By enabling <link linkend="guc-sync-replication-slots">

3)
wait_for_primary_slot_catchup():
+ /*
+ * It is possible to get null values for confirmed_lsn and
+ * catalog_xmin if on the primary server the slot is just created with
+ * a valid restart_lsn and slot-sync worker has fetched the slot
+ * before the primary server could set valid confirmed_lsn and
+ * catalog_xmin.
+ */

Do we need this special handling? We already have one such handling in
synchronize_slots(). please see:
/*
* If restart_lsn, confirmed_lsn or catalog_xmin is
invalid but the
* slot is valid, that means we have fetched the
remote_slot in its
* RS_EPHEMERAL state. In such a case, don't sync it;
we can always
* sync it in the next sync cycle when the remote_slot
is persisted
* and has valid lsn(s) and xmin values.
*/
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
!TransactionIdIsValid(remote_slot->catalog_xmin)) &&
remote_slot->invalidated == RS_INVAL_NONE)
pfree(remote_slot);

Due to the above check in synchronize_slots(), we will not reach
wait_for_primary_slot_catchup() when any of confirmed_lsn or
catalog_xmin is not initialized.

[1]: https://www.postgresql.org/message-id/CAJpy0uAD_La2vi%2BB%2BiSBbCYTMayMstvbF9ndrAJysL9t5fHtbQ%40mail.gmail.com

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-07-09 05:26:24 Re: POC: Parallel processing of indexes in autovacuum
Previous Message Hayato Kuroda (Fujitsu) 2025-07-09 04:41:59 RE: A assert failure when initdb with track_commit_timestamp=on