Re: Synchronizing slots from primary to standby

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-19 05:05:06
Message-ID: CAD21AoBgzONdt3o5mzbQ4MtqAE=WseiXUOq0LMqne-nWGjZBsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 7:30 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote:
> > > PFA v62. Details:
> >
> > Thanks!
> >
> > > v62-003:
> > > It is a new patch which attempts to implement slot-sync worker as a
> > > special process which is neither a bgworker nor an Auxiliary process.
> > > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
> > > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
> > > it is hot-standby and 'enable_syncslot' is ON.
> >
> > The implementation looks reasonable to me (from what I can see some parts is
> > copy/paste from an already existing "special" process and some parts are
> > "sync slot" specific) which makes fully sense.
> >
> > A few remarks:
> >
> > 1 ===
> > + * Was it the slot sycn worker?
> >
> > Typo: sycn
> >
> > 2 ===
> > + * ones), and no walwriter, autovac launcher or bgwriter or slot sync
> >
> > Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync"
> >
> > 3 ===
> > + * restarting slot slyc worker. If stopSignaled is set, the worker will
> >
> > Typo: slyc
> >
> > 4 ===
> > +/* Flag to tell if we are in an slot sync worker process */
> >
> > s/an/a/ ?
> >
> > 5 === (coming from v62-0002)
> > + Assert(tuplestore_tuple_count(res->tuplestore) == 1);
> >
> > Is it even possible for the related query to not return only one row? (I think the
> > "count" ensures it).
> >
> > 6 ===
> > if (conninfo_changed ||
> > primary_slotname_changed ||
> > + old_enable_syncslot != enable_syncslot ||
> > (old_hot_standby_feedback != hot_standby_feedback))
> > {
> > ereport(LOG,
> > errmsg("slot sync worker will restart because of"
> > " a parameter change"));
> >
> > I don't think "slot sync worker will restart" is true if one change enable_syncslot
> > from on to off.
> >
> > IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease
> > the review). But let's wait to see if others think differently.
> >
> > Regards,
> >
> > --
> > Bertrand Drouvot
> > PostgreSQL Contributors Team
> > RDS Open Source Databases
> > Amazon Web Services: https://aws.amazon.com
>
>
> PFA v63.
>
> --It addresses comments by Peter given in [1], [2], comment by Nisha
> given in [3], comments by Bertrand given in [4]
> --It also moves race-condition fix from patch003 to patch002 as
> suggested by Swada-san offlist. Race-condition is mentioned in [5]
>

Thank you for updating the patch. I have some comments:

---
+ latestWalEnd = GetWalRcvLatestWalEnd();
+ if (remote_slot->confirmed_lsn > latestWalEnd)
+ {
+ elog(ERROR, "exiting from slot synchronization as the
received slot sync"
+ " LSN %X/%X for slot \"%s\" is ahead of the
standby position %X/%X",
+ LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
+ remote_slot->name,
+ LSN_FORMAT_ARGS(latestWalEnd));
+ }

IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
typically the primary server's flush position and doesn't mean the LSN
where the walreceiver received/flushed up to. Does it really happen
that the slot's confirmed_flush_lsn is higher than the primary's flush
lsn?

---
After dropping a database on the primary, I got the following LOG (PID
2978463 is the slotsync worker on the standby):

LOG: still waiting for backend with PID 2978463 to accept ProcSignalBarrier
CONTEXT: WAL redo at 0/301CE00 for Database/DROP: dir 1663/16384

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-01-19 05:20:31 Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx
Previous Message Dilip Kumar 2024-01-19 04:54:54 Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)