Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-07-27 05:24:57
Message-ID: CAA4eK1J0AAHSCxQVDA1kOjKB_PWCzk5SqrHA+FxDZCB3TTS43g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 26, 2023 at 10:31 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 24, 2023 at 8:03 AM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > Is having one (or a few more - not
> > > necessarily one for each logical slot) worker for all logical slots
> > > enough?
> > >
> >
> > I guess for a large number of slots the is a possibility of a large
> > gap in syncing the slots which probably means we need to retain
> > corresponding WAL for a much longer time on the primary. If we can
> > prove that the gap won't be large enough to matter then this would be
> > probably worth considering otherwise, I think we should find a way to
> > scale the number of workers to avoid the large gap.
> >
>
> How about this:
>
> 1) On standby, spawn 1 worker per database in the start (as it is
> doing currently).
>
> 2) Maintain statistics on activity against each primary's database on
> standby by any means. Could be by maintaining 'last_synced_time' and
> 'last_activity_seen time'. The last_synced_time is updated every time
> we sync/recheck slots for that particular database. The
> 'last_activity_seen_time' changes only if we get any slot on that
> database where actually confirmed_flush or say restart_lsn has changed
> from what was maintained already.
>
> 3) If at any moment, we find that 'last_synced_time' -
> 'last_activity_seen' goes beyond a threshold, that means that DB is
> not active currently. Add it to list of inactive DB
>

I think we should also increase the next_sync_time if in current sync,
there is no update.

> 4) Launcher on the other hand is always checking if it needs to spawn
> any other extra worker for any new DB. It will additionally check if
> number of inactive databases (maintained on standby) has gone higher
> (> some threshold), then it brings down the workers for those and
> starts a common worker which takes care of all such inactive databases
> (or merge all in 1), while workers for active databases remain as such
> (i.e. one per db). Each worker maintains the list of DBs which it is
> responsible for.
>
> 5) If in the list of these inactive databases, we again find any
> active database using the above logic, then the launcher will spawn a
> separate worker for that.
>

I wonder if we anyway some sort of design like this because we
shouldn't allow to spawn as many workers as the number of databases.
There has to be some existing or new GUC like max_sync_slot_workers
which decided the number of workers.

Overall, this sounds to be a more workload-adaptive approach as
compared to the current one.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-07-27 06:43:44 Re: Synchronizing slots from primary to standby
Previous Message Amit Kapila 2023-07-27 05:02:46 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication