Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-09 10:54:11
Message-ID: CAJpy0uC2gYybBcvYCW68wvR-3akWnFbr7x66nZg7XTyjELW9iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 9, 2023 at 8:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 9, 2023 at 8:11 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Nov 8, 2023 at 8:09 PM Drouvot, Bertrand
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > > Unrelated to above, if there is a user slot on standby with the same
> > > > name which the slot-sync worker is trying to create, then shall it
> > > > emit a warning and skip the sync of that slot or shall it throw an
> > > > error?
> > > >
> > >
> > > I'd vote for emit a warning and move on to the next slot if any.
> > >
> >
> > But then it could take time for users to know the actual problem and
> > they probably notice it after failover. OTOH, if we throw an error
> > then probably they will come to know earlier because the slot sync
> > mechanism would be stopped. Do you have reasons to prefer giving a
> > WARNING and skipping creating such slots? I expect this WARNING to
> > keep getting repeated in LOGs because the consecutive sync tries will
> > again generate a WARNING.
> >
>
> Apart from the above, I would like to discuss the slot sync work
> distribution strategy of this patch. The current implementation as
> explained in the commit message [1] works well if the slots belong to
> multiple databases. It is clear from the data in emails [2][3][4] that
> having more workers really helps if the slots belong to multiple
> databases. But I think if all the slots belong to one or very few
> databases then such a strategy won't be as good. Now, on one hand, we
> get very good numbers for a particular workload with the strategy used
> in the patch but OTOH it may not be adaptable to various different
> kinds of workloads. So, I have a question whether we should try to
> optimize this strategy for various kinds of workloads or for the first
> version let's use a single-slot sync-worker and then we can enhance
> the functionality in later patches either in PG17 itself or in PG18 or
> later versions. One thing to note is that a lot of the complexity of
> the patch is attributed to the multi-worker strategy which may still
> not be efficient, so there is an argument to go with a simpler
> single-slot sync-worker strategy and then enhance it in future
> versions as we learn more about various workloads. It will also help
> to develop this feature incrementally instead of doing all the things
> in one go and taking a much longer time than it should.
>
> Thoughts?
>
> [1] - "The replication launcher on the physical standby queries
> primary to get the list of dbids for failover logical slots. Once it
> gets the dbids, if dbids < max_slotsync_workers, it starts only that
> many workers, and if dbids > max_slotsync_workers, it starts
> max_slotsync_workers and divides the work equally among them. Each
> worker is then responsible to keep on syncing the logical slots
> belonging to the DBs assigned to it.
>
> Each slot-sync worker will have its own dbids list. Since the upper
> limit of this dbid-count is not known, it needs to be handled using
> dsa. We initially allocated memory to hold 100 dbids for each worker.
> If this limit is exhausted, we reallocate this memory with size
> incremented again by 100."
>
> [2] - https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com
> [3] - https://www.postgresql.org/message-id/CAFPTHDZw2G3Pax0smymMjfPqdPcZhMWo36f9F%2BTwNTs0HFxK%2Bw%40mail.gmail.com
> [4] - https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.

PFA v32 patches which has below changes:

1) Changed how standby_slot_names is handled. On reanalyzing, logical
decoding context might not be the best place to cache the standby slot
list, because not all the callers(1. user backend. 2. walsender 3.
slotsync worker) can access the logical decoding ctx. To make the
access of the list consistent, cache the list in a global variable
instead. Also, to avoid the trouble of allocating and freeing the list
at various places, we [re]initialize the list in the GUC assign hook,
it would be easier for caller to use the list.

2) Changed 'bool synced' in ReplicationSlotPersistentData to 'char
sync_state'. Values are:
'n': none for user slots,
'i': sync initiated for the slot but waiting for primary to catch up.
'r': ready for periodic syncs.

3) Improved slot-creation logic in slot sync worker. Now any active
slot's sync is not blocked by inactive slot's creation. The worker
attempts pinging the primary server a fixed number of times and waits
for it to catch-up with local-slot's lsn, after that it moves to the
next slot. The worker reattempts the wait for pending ones in the next
sync-cycle. Meanwhile any such slot (waiting for primary to catch-up)
is not dropped but sync_status is marked as 'i'. Once the worker
finishes initialization for such a slot (in any of the sync-cycles),
sync_state of slot is changed to 'r'.

4) The slots with state 'i' are dropped by the slot-sync worker when
it finds out that it is no longer in standby mode and then it exits.

5) Cascading standby does not sync slots with 'sync_state' = 'i' from
the first standby.

6) Changed the naptime computation logic. Now during each sync-cycle,
if any of the received slots is updated, we retain default-naptime
else we increase the naptime provided inactivity time reaches
threshold.

7) Added warning for cases where a user-slot with the same name is
already present which slot-sync worker is trying to create. Sync for
such slots is skipped.

Changes for 1 are in patch001 and patch003. Changes for 2-7 are in patch002.

Thank You Hou-san for working on 1.

Open Question:
1) Currently I have put drop slot logic for slots with 'sync_state=i'
in slot-sync worker. Do we need to put it somewhere in promotion-logic
as well? Perhaps in WaitForWALToBecomeAvailable() where we call
XLogShutdownWalRcv after checking 'CheckForStandbyTrigger'. Thoughts?

thanks
Shveta

Attachment Content-Type Size
v32-0003-Allow-slot-sync-workers-to-wait-for-the-cascadin.patch application/octet-stream 7.9 KB
v32-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 119.5 KB
v32-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 125.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-11-09 11:04:21 Re: GUC names in messages
Previous Message Alvaro Herrera 2023-11-09 10:39:21 Re: A recent message added to pg_upgade