Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-16 11:57:05
Message-ID: CAJpy0uCJ7MKgZnKLAACd-AXN0VbWM5gVJ+GRJ1Za_A2UmF3R0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 16, 2024 at 3:10 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > There are multiple approaches discussed and tried when it comes to
> > > > > > starting a slot-sync worker. I am summarizing all here:
> > > > > >
> > > > > > 1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > > > > > walwriter, walreceiver etc). The benefit this approach provides is, it
> > > > > > can control begin and stop in a more flexible way as each auxiliary
> > > > > > process could have different checks before starting and can have
> > > > > > different stop conditions. But it needs code duplication for process
> > > > > > management(start, stop, crash handling, signals etc) and currently it
> > > > > > does not support db-connection smoothly (none of the auxiliary process
> > > > > > has one so far)
> > > > > >
> > > > >
> > > > > As slotsync worker needs to perform transactions and access syscache,
> > > > > we can't make it an auxiliary process as that doesn't initialize the
> > > > > required stuff like syscache. Also, see the comment "Auxiliary
> > > > > processes don't run transactions ..." in AuxiliaryProcessMain() which
> > > > > means this is not an option.
> > > > >
> > > > > >
> > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > > > > > which is neither an Auxiliary process nor a bgworker one. It allows
> > > > > > db-connection and also provides flexibility to have start and stop
> > > > > > conditions for a process.
> > > > > >
> > > > >
> > > > > Yeah, due to these reasons, I think this option is worth considering
> > > > > and another plus point is that this allows us to make enable_syncslot
> > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> > > > >
> > > > > >
> > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > > > > > relevant start_time and restart_time and then the process management
> > > > > > is well taken care of. It does not need any code-duplication and
> > > > > > allows db-connection smoothly in registered process. The only thing it
> > > > > > lacks is that it does not provide flexibility of having
> > > > > > start-condition which then makes us to have 'enable_syncslot' as
> > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > > > > > feel enable_syncslot is something which will not be changed frequently
> > > > > > and with the benefits provided by bgworker infra, it seems a
> > > > > > reasonably good option to choose this approach.
> > > > > >
> > > > >
> > > > > I agree but it may be better to make it a PGC_SIGHUP parameter.
> > > > >
> > > > > > 4) Another option is to have Logical Replication Launcher(or a new
> > > > > > process) to launch slot-sync worker. But going by the current design
> > > > > > where we have only 1 slotsync worker, it may be an overhead to have an
> > > > > > additional manager process maintained.
> > > > > >
> > > > >
> > > > > I don't see any good reason to have an additional launcher process here.
> > > > >
> > > > > >
> > > > > > Thus weighing pros and cons of all these options, we have currently
> > > > > > implemented the bgworker approach (approach 3). Any feedback is
> > > > > > welcome.
> > > > > >
> > > > >
> > > > > I vote to go for (2) unless we face difficulties in doing so but (3)
> > > > > is also okay especially if others also think so.
> > > >
> > > > I am not against any of the approaches but I still feel that when we
> > > > have a standard way of doing things (bgworker) we should not keep
> > > > adding code to do things in a special way unless there is a strong
> > > > reason to do so. Now we need to decide if 'enable_syncslot' being
> > > > PGC_POSTMASTER is a strong reason to go the non-standard way?
> > > >
> > >
> > > Agreed and as said earlier I think it is better to make it a
> > > PGC_SIGHUP. Also, not sure we can say it is a non-standard way as
> > > already autovacuum launcher is handled in the same way. One more minor
> > > thing is it will save us for having a new bgworker state
> > > BgWorkerStart_ConsistentState_HotStandby as introduced by this patch.
> >
> > Why do we need to add a new BgWorkerStart_ConsistentState_HotStandby
> > for the slotsync worker? Isn't it sufficient that the slotsync worker
> > exits if not in hot standby mode?
>
> It is doable, but that will mean starting slot-sync worker even on
> primary on every server restart which does not seem like a good idea.
> We wanted to have a way where-in it does not start itself in
> non-standby mode.
>
> > Is there any technical difficulty or obstacle to make the slotsync
> > worker start using bgworker after reloading the config file?
>
> When we register slotsync worker as bgworker, we can only register the
> bgworker before initializing shared memory, we cannot register
> dynamically in the cycle of ServerLoop and thus we do not have
> flexibility of registering/deregistering the bgworker (or controlling
> the bgworker start) based on config parameters each time they change.
> We can always start slot-sync worker and let it check if
> enable_syncslot is ON. If not, exit and retry the next time when
> postmaster will restart it after restart_time(60sec). The downside of
> this approach is, even if any user does not want slot-sync
> functionality and thus has permanently disabled 'enable_syncslot', it
> will keep on restarting and exiting there.

PFA v62. Details:

v62-001: No change.

v62-002:
1) Addressed slotsync.c related comments by Peter in [1].
2) Addressed CFBot failure where there was a crash in 32 bit env while
accessing DatumGetLSN
3) Addressed another CFBot failure where the test for
'050_standby_failover_slots_sync.pl' was hanging. Thanks Hou-San for
this fix.

v62-003:
It is a new patch which attempts to implement slot-sync worker as a
special process which is neither a bgworker nor an Auxiliary process.
Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
it is hot-standby and 'enable_syncslot' is ON.

v62-004:
Small change in document.

v62-005: No change

v62-006:
Separated the failover-ready validation steps into this separate
doc-patch (which were earlier present in v61-002 and v61-003). Also
addressed some of the doc comments by Peter in [1].
Thanks Hou-San for providing this patch.

[1]: https://www.postgresql.org/message-id/CAHut%2BPteZVNx1jQ6Hs3mEdoC%3DDNALVpJJ2mZDYim7sU-04tiaw%40mail.gmail.com

thanks
Shveta

Attachment Content-Type Size
v62-0004-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 40.3 KB
v62-0005-Non-replication-connection-and-app_name-change.patch application/octet-stream 9.7 KB
v62-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 82.7 KB
v62-0001-Enable-setting-failover-property-for-a-slot-thro.patch application/octet-stream 100.3 KB
v62-0003-Slot-sync-worker-as-a-special-process.patch application/octet-stream 36.3 KB
v62-0006-Document-the-steps-to-check-if-the-standby-is-re.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-01-16 12:00:29 RE: Synchronizing slots from primary to standby
Previous Message Heikki Linnakangas 2024-01-16 11:54:53 Re: ResourceOwner refactoring