RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-02-16 05:41:55
Message-ID: OS0PR01MB5716CB015BAD807B29BC55BE944C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, February 16, 2024 8:33 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> On Thursday, February 15, 2024 8:29 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> >
> > On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> > > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > > Attach the v2 patch here.
> > > > >
> > > > > Apart from the new log message. I think we can add one more
> > > > > debug message in reserve_wal_for_local_slot, this could be
> > > > > useful to analyze the
> > failure.
> > > >
> > > > Yeah, that can also be helpful, but the added message looks naive to me.
> > > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno,
> > > > + segno);
> > > >
> > > > Instead of the above, how about something like: "segno: %ld of
> > > > purposed restart_lsn for the synced slot, oldest_segno: %ld
> > > > available"?
> > >
> > > Looks good to me. I'm not sure if it would make more sense to elog
> > > only if segno < oldest_segno means just before the
> > XLogSegNoOffsetToRecPtr() call?
> > >
> > > But I'm fine with the proposed location too.
> > >
> >
> > I am also fine either way but the current location gives required
> > information in more number of cases and could be helpful in debugging this
> new facility.
> >
> > > >
> > > > > And we
> > > > > can also enable the DEBUG log in the 040 tap-test, I see we have
> > > > > similar setting in 010_logical_decoding_timline and logging
> > > > > debug1 message doesn't increase noticable time on my machine.
> > > > > These are done
> > in 0002.
> > > > >
> > > >
> > > > I haven't tested it but I think this can help in debugging BF
> > > > failures, if any. I am not sure if to keep it always like that but
> > > > till the time these tests are stabilized, this sounds like a good
> > > > idea. So, how, about just making test changes as a separate patch
> > > > so that later if required we can revert/remove it easily?
> > > > Bertrand, do you have any thoughts on this?
> > >
> > > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I
> > > +think we
> > > took the same approach for 035_standby_logical_decoding.pl IIRC) and
> > > then revert it back.
> > >
> >
> > Good to know!
> >
> > > Also I was thinking: what about adding an output to
> > pg_sync_replication_slots()?
> > > The output could be the number of sync slots that have been created
> > > and are not considered as sync-ready during the execution.
> > >
> >
> > Yeah, we can consider outputting some information via this function
> > like how many slots are synced and persisted but not sure what would
> > be appropriate here. Because one can anyway find that or more
> > information by querying pg_replication_slots. I think we can keep
> > discussing what makes more sense as a return value but let's commit
> > the debug/log patches as they will be helpful to analyze BF failures or any
> other issues reported.
>
> Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it could
> provide more information about communication between primary and standby.
> This also doesn't increase noticeable testing time on my machine for debug
> build.

Sorry, there was a miss in the DEBUG message, I should have used
UINT64_FORMAT for XLogSegNo(uint64) instead of %ld. Here is a small patch
to fix this.

Best Regards,
Hou zj

Attachment Content-Type Size
0001-fix-the-format-for-uint64.patch application/octet-stream 958 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-16 06:13:54 Re: Synchronizing slots from primary to standby
Previous Message Andrei Lepikhov 2024-02-16 05:32:36 Re: POC, WIP: OR-clause support for indexes