Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-06-16 09:56:05
Message-ID: CAA4eK1LrZuDLGu6saZ8RDs7gKLM0dmafq+9oqRPKKN1cEEao_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Please find attached V5 (a rebase of V4 posted up-thread).
>
> In addition to the "rebasing" work, the TAP test adds a test about conflict handling (logical slot invalidation)
> relying on the work done in the "Minimal logical decoding on standby" patch series.
>
> I did not look more at the patch (than what's was needed for the rebase) but plan to do so.
>

Are you still planning to continue working on this? Some miscellaneous
comments while going through this patch are as follows?

1. Can you please try to explain the functionality of the overall
patch somewhere in the form of comments and or commit message?

2. It seems that the initially synchronized list of slots is only used
to launch a per-database worker to synchronize all the slots
corresponding to that database. If so, then why do we need to fetch
all the slot-related information via LIST_SLOTS command?

3. As mentioned in the initial email, I think it would be better to
replace LIST_SLOTS command with a SELECT query.

4. How the limit of sync_slot workers is decided? Can we document such
a piece of information? Do we need a new GUC to decide the number of
workers? Ideally, it would be better to avoid GUC, can we use any
existing logical replication workers related GUC?

5. Can we separate out the functionality related to standby_slot_names
in a separate patch, probably the first one? I think that will patch
easier to review.

6. In libpqrcv_list_slots(), two-phase related slot information is not
retrieved. Is there a reason for the same?

7.
+static void
+wait_for_standby_confirmation(XLogRecPtr commit_lsn)

Some comments atop this function would make it easier to review.

8.
+/*-------------------------------------------------------------------------
+ * slotsync.c
+ * PostgreSQL worker for synchronizing slots to a standby from primary
+ *
+ * Copyright (c) 2016-2018, PostgreSQL Global Development Group
+ *

The copyright notice is out-of-date.

9. Why synchronize_one_slot() compares
MyReplicationSlot->data.restart_lsn with the value of
confirmed_flush_lsn passed to it? Also, why it does only for new slots
but not existing slots?

10. Can we somehow test if the restart_lsn is advanced properly after
sync? I think it is important to ensure that because otherwise after
standby's promotion, the subscriber can start syncing from the wrong
position.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2023-06-16 10:18:49 Re: Pluggable toaster
Previous Message Alvaro Herrera 2023-06-16 09:47:51 Re: Use the enum value CRS_EXPORT_SNAPSHOT instead of "0"