RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-02-16 00:32:45
Message-ID: OS0PR01MB571633C23B2A4CAC5FB0371A944C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, February 15, 2024 8:29 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

>
> On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > Attach the v2 patch here.
> > > >
> > > > Apart from the new log message. I think we can add one more debug
> > > > message in reserve_wal_for_local_slot, this could be useful to analyze the
> failure.
> > >
> > > Yeah, that can also be helpful, but the added message looks naive to me.
> > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno);
> > >
> > > Instead of the above, how about something like: "segno: %ld of
> > > purposed restart_lsn for the synced slot, oldest_segno: %ld
> > > available"?
> >
> > Looks good to me. I'm not sure if it would make more sense to elog
> > only if segno < oldest_segno means just before the
> XLogSegNoOffsetToRecPtr() call?
> >
> > But I'm fine with the proposed location too.
> >
>
> I am also fine either way but the current location gives required information in
> more number of cases and could be helpful in debugging this new facility.
>
> > >
> > > > And we
> > > > can also enable the DEBUG log in the 040 tap-test, I see we have
> > > > similar setting in 010_logical_decoding_timline and logging debug1
> > > > message doesn't increase noticable time on my machine. These are done
> in 0002.
> > > >
> > >
> > > I haven't tested it but I think this can help in debugging BF
> > > failures, if any. I am not sure if to keep it always like that but
> > > till the time these tests are stabilized, this sounds like a good
> > > idea. So, how, about just making test changes as a separate patch so
> > > that later if required we can revert/remove it easily? Bertrand, do
> > > you have any thoughts on this?
> >
> > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I
> > +think we
> > took the same approach for 035_standby_logical_decoding.pl IIRC) and
> > then revert it back.
> >
>
> Good to know!
>
> > Also I was thinking: what about adding an output to
> pg_sync_replication_slots()?
> > The output could be the number of sync slots that have been created
> > and are not considered as sync-ready during the execution.
> >
>
> Yeah, we can consider outputting some information via this function like how
> many slots are synced and persisted but not sure what would be appropriate
> here. Because one can anyway find that or more information by querying
> pg_replication_slots. I think we can keep discussing what makes more sense as a
> return value but let's commit the debug/log patches as they will be helpful to
> analyze BF failures or any other issues reported.

Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it
could provide more information about communication between primary and standby.
This also doesn't increase noticeable testing time on my machine for debug
build.

Best Regards,
Hou zj

Attachment Content-Type Size
v3-0002-Add-a-DEBUG1-message-for-slot-sync.patch application/octet-stream 955 bytes
v3-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch application/octet-stream 1.4 KB
v3-0003-change-the-log-level-of-sync-slot-test-to-debug2.patch application/octet-stream 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-02-16 00:43:31 Re: When extended query protocol ends?
Previous Message Jeff Davis 2024-02-16 00:13:19 Re: Built-in CTYPE provider