Re: Synchronizing slots from primary to standby

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-17 01:13:19
Message-ID: CAD21AoA5izeKpp9Ei4Cd745pKX3wn-TRvhhmPFEW9UY1nx+_aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 16, 2024 at 6:40 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > There are multiple approaches discussed and tried when it comes to
> > > > > > starting a slot-sync worker. I am summarizing all here:
> > > > > >
> > > > > > 1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > > > > > walwriter, walreceiver etc). The benefit this approach provides is, it
> > > > > > can control begin and stop in a more flexible way as each auxiliary
> > > > > > process could have different checks before starting and can have
> > > > > > different stop conditions. But it needs code duplication for process
> > > > > > management(start, stop, crash handling, signals etc) and currently it
> > > > > > does not support db-connection smoothly (none of the auxiliary process
> > > > > > has one so far)
> > > > > >
> > > > >
> > > > > As slotsync worker needs to perform transactions and access syscache,
> > > > > we can't make it an auxiliary process as that doesn't initialize the
> > > > > required stuff like syscache. Also, see the comment "Auxiliary
> > > > > processes don't run transactions ..." in AuxiliaryProcessMain() which
> > > > > means this is not an option.
> > > > >
> > > > > >
> > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > > > > > which is neither an Auxiliary process nor a bgworker one. It allows
> > > > > > db-connection and also provides flexibility to have start and stop
> > > > > > conditions for a process.
> > > > > >
> > > > >
> > > > > Yeah, due to these reasons, I think this option is worth considering
> > > > > and another plus point is that this allows us to make enable_syncslot
> > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> > > > >
> > > > > >
> > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > > > > > relevant start_time and restart_time and then the process management
> > > > > > is well taken care of. It does not need any code-duplication and
> > > > > > allows db-connection smoothly in registered process. The only thing it
> > > > > > lacks is that it does not provide flexibility of having
> > > > > > start-condition which then makes us to have 'enable_syncslot' as
> > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > > > > > feel enable_syncslot is something which will not be changed frequently
> > > > > > and with the benefits provided by bgworker infra, it seems a
> > > > > > reasonably good option to choose this approach.
> > > > > >
> > > > >
> > > > > I agree but it may be better to make it a PGC_SIGHUP parameter.
> > > > >
> > > > > > 4) Another option is to have Logical Replication Launcher(or a new
> > > > > > process) to launch slot-sync worker. But going by the current design
> > > > > > where we have only 1 slotsync worker, it may be an overhead to have an
> > > > > > additional manager process maintained.
> > > > > >
> > > > >
> > > > > I don't see any good reason to have an additional launcher process here.
> > > > >
> > > > > >
> > > > > > Thus weighing pros and cons of all these options, we have currently
> > > > > > implemented the bgworker approach (approach 3). Any feedback is
> > > > > > welcome.
> > > > > >
> > > > >
> > > > > I vote to go for (2) unless we face difficulties in doing so but (3)
> > > > > is also okay especially if others also think so.
> > > >
> > > > I am not against any of the approaches but I still feel that when we
> > > > have a standard way of doing things (bgworker) we should not keep
> > > > adding code to do things in a special way unless there is a strong
> > > > reason to do so. Now we need to decide if 'enable_syncslot' being
> > > > PGC_POSTMASTER is a strong reason to go the non-standard way?
> > > >
> > >
> > > Agreed and as said earlier I think it is better to make it a
> > > PGC_SIGHUP. Also, not sure we can say it is a non-standard way as
> > > already autovacuum launcher is handled in the same way. One more minor
> > > thing is it will save us for having a new bgworker state
> > > BgWorkerStart_ConsistentState_HotStandby as introduced by this patch.
> >
> > Why do we need to add a new BgWorkerStart_ConsistentState_HotStandby
> > for the slotsync worker? Isn't it sufficient that the slotsync worker
> > exits if not in hot standby mode?
>
> It is doable, but that will mean starting slot-sync worker even on
> primary on every server restart which does not seem like a good idea.
> We wanted to have a way where-in it does not start itself in
> non-standby mode.

Understood.

Another idea would be that the startup process dynamically registers
the slotsync worker if hot_standby is enabled. But it doesn't seem
like the right approach.

>
> > Is there any technical difficulty or obstacle to make the slotsync
> > worker start using bgworker after reloading the config file?
>
> When we register slotsync worker as bgworker, we can only register the
> bgworker before initializing shared memory, we cannot register
> dynamically in the cycle of ServerLoop and thus we do not have
> flexibility of registering/deregistering the bgworker (or controlling
> the bgworker start) based on config parameters each time they change.
> We can always start slot-sync worker and let it check if
> enable_syncslot is ON. If not, exit and retry the next time when
> postmaster will restart it after restart_time(60sec). The downside of
> this approach is, even if any user does not want slot-sync
> functionality and thus has permanently disabled 'enable_syncslot', it
> will keep on restarting and exiting there.

Thanks for the explanation. It sounds like it's not impossible but
would require some work. If allowing bgworkers to start also on SIGUP
is a general improvement, we can implement it later while having
enable_syncslot PGC_POSTMASTER at this time. Then, we will be able to
make the enable_syncslot PGC_SIGUP later.

BTW I think I found a race condition in the v61 patch to cause that
the slotsync worker continues working even after promotion (I've not
tested with v62 patch though). At the time when the startup shutdown
the slotsync worker in FinishWalRecovery(), the postmaster's pmState
is still PM_HOT_STANDBY. And if the slotsync worker is not running
when the startup process attempts to shutdown it, ShutDownSlotSync()
does nothing. Therefore, after the startup process doesn't shutdown
the slotsync worker, the postmaster could relaunch the slotsync worker
before its state transition to PM_RUN.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-01-17 01:18:37 Re: reorganize "Shared Memory and LWLocks" section of docs
Previous Message Richard Guo 2024-01-17 01:10:33 Re: Fix a typo of func DecodeInsert()