Re: Synchronizing slots from primary to standby

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-15 12:16:48
Message-ID: Zc4AsF9FJPDW0iDR@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > Attach the v2 patch here.
> >
> > Apart from the new log message. I think we can add one more debug message in
> > reserve_wal_for_local_slot, this could be useful to analyze the failure.
>
> Yeah, that can also be helpful, but the added message looks naive to me.
> + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno);
>
> Instead of the above, how about something like: "segno: %ld of
> purposed restart_lsn for the synced slot, oldest_segno: %ld
> available"?

Looks good to me. I'm not sure if it would make more sense to elog only if
segno < oldest_segno means just before the XLogSegNoOffsetToRecPtr() call?

But I'm fine with the proposed location too.

>
> > And we
> > can also enable the DEBUG log in the 040 tap-test, I see we have similar
> > setting in 010_logical_decoding_timline and logging debug1 message doesn't
> > increase noticable time on my machine. These are done in 0002.
> >
>
> I haven't tested it but I think this can help in debugging BF
> failures, if any. I am not sure if to keep it always like that but
> till the time these tests are stabilized, this sounds like a good
> idea. So, how, about just making test changes as a separate patch so
> that later if required we can revert/remove it easily? Bertrand, do
> you have any thoughts on this?

+1 on having DEBUG log in the 040 tap-test until it's stabilized (I think we
took the same approach for 035_standby_logical_decoding.pl IIRC) and then revert
it back.

Also I was thinking: what about adding an output to pg_sync_replication_slots()?
The output could be the number of sync slots that have been created and are
not considered as sync-ready during the execution. I think that could be a good
addition to v2-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch
proposed here (should trigger special attention in case of non zero value).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-15 12:28:47 Re: Synchronizing slots from primary to standby
Previous Message Robert Haas 2024-02-15 12:11:46 Re: Add system identifier to backup manifest