Re: Synchronizing slots from primary to standby

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-07-28 14:39:27
Message-ID: CALj2ACVvM8omB6FsukxGJeRT0x8zH9VQO-3hHz+WiMCAZzFJeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > 2. All candidate standbys will start one slot sync worker per logical
> > slot which might not be scalable.
>
> Yeah, that doesn't sound like a good idea but IIRC, the proposed patch
> is using one worker per database (for all slots corresponding to a
> database).

Right. It's based on one worker for each database.

> > Is having one (or a few more - not
> > necessarily one for each logical slot) worker for all logical slots
> > enough?
>
> I guess for a large number of slots the is a possibility of a large
> gap in syncing the slots which probably means we need to retain
> corresponding WAL for a much longer time on the primary. If we can
> prove that the gap won't be large enough to matter then this would be
> probably worth considering otherwise, I think we should find a way to
> scale the number of workers to avoid the large gap.

I think the gap is largely determined by the time taken to advance
each slot and the amount of WAL that each logical slot moves ahead on
primary. I've measured the time it takes for
pg_logical_replication_slot_advance with different amounts WAL on my
system. It took 2595ms/5091ms/31238ms to advance the slot by
3.7GB/7.3GB/13GB respectively. To put things into perspective here,
imagine there are 3 logical slots to sync for a single slot sync
worker and each of them are in need of advancing the slot by
3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again
after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after
advance time of slot 1 with amount of WAL that the slot has moved
ahead on primary during 40sec, gets to slot 3 again after advance time
of slot 1 and slot 2 with amount of WAL that the slot has moved ahead
on primary and so on. If WAL generation on the primary is pretty fast,
and if the logical slot moves pretty fast on the primary, the time it
takes for a single sync worker to sync a slot can increase.

Now, let's think what happens if there's a large gap, IOW, a logical
slot on standby is behind X amount of WAL from that of the logical
slot on primary. The standby needs to retain more WAL for sure. IIUC,
primary doesn't need to retain the WAL required for a logical slot on
standby, no?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2023-07-28 15:12:48 Eager page freeze criteria clarification
Previous Message Anthonin Bonnefoy 2023-07-28 14:06:20 Re: POC: Extension for adding distributed tracing - pg_tracing