Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-15 11:30:18
Message-ID: CAA4eK1JsVXp8UuGPpwYL+1BzsGebNcLXdpDce-FNOnujoz2Ztw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, February 15, 2024 5:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Thu, Feb 15, 2024 at 9:05 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> > wrote:
> > >
> > > On Thursday, February 15, 2024 10:49 AM Amit Kapila
> > <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
> > > >
> > > > Right, we can do that or probably this test would have made more
> > > > sense with a worker patch where we could wait for the slot to be synced.
> > > > Anyway, let's try to recreate the slot/subscription idea. BTW, do
> > > > you think that adding a LOG when we are not able to sync will help
> > > > in debugging such problems? I think eventually we can change it to
> > > > DEBUG1 but for now, it can help with stabilizing BF and or some other
> > reported issues.
> > >
> > > Here is the patch that attempts the re-create sub idea.
> > >
> >
> > Pushed this.
> >
> > >
> > I also think that a LOG/DEBUG
> > > would be useful for such analysis, so the 0002 is to add such a log.
> > >
> >
> > I feel such a LOG would be useful.
> >
> > + ereport(LOG,
> > + errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
> > + " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
> >
> > I think waiting is a bit misleading here, how about something like:
> > "could not sync slot information as remote slot precedes local slot:
> > remote slot \"%s\": LSN (%X/%X), catalog xmin (%u) local slot: LSN (%X/%X),
> > catalog xmin (%u)"
>
> Changed.
>
> Attach the v2 patch here.
>
> Apart from the new log message. I think we can add one more debug message in
> reserve_wal_for_local_slot, this could be useful to analyze the failure.

Yeah, that can also be helpful, but the added message looks naive to me.
+ elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno);

Instead of the above, how about something like: "segno: %ld of
purposed restart_lsn for the synced slot, oldest_segno: %ld
available"?

> And we
> can also enable the DEBUG log in the 040 tap-test, I see we have similar
> setting in 010_logical_decoding_timline and logging debug1 message doesn't
> increase noticable time on my machine. These are done in 0002.
>

I haven't tested it but I think this can help in debugging BF
failures, if any. I am not sure if to keep it always like that but
till the time these tests are stabilized, this sounds like a good
idea. So, how, about just making test changes as a separate patch so
that later if required we can revert/remove it easily? Bertrand, do
you have any thoughts on this?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-02-15 12:06:16 Re: Memory consumed by paths during partitionwise join planning
Previous Message John Naylor 2024-02-15 11:26:13 Re: [PoC] Improve dead tuple storage for lazy vacuum