Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-04-06 04:43:00
Message-ID: CAA4eK1JxM5_0H+QNZ9QW=dU0OqSegiGPF5prY4BkOST-eb5_Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Thinking more on this, it doesn't seem related to
> > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> > any locking or something like that which impacts write positions.
>
> Agree.
>
> > I think what has happened here is that running_xact record written by
> > the background writer [1] is not written to the kernel or disk (see
> > LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > current_lsn to be compared with replayed LSN.
>
> Agree, I think it's not visible through pg_current_wal_lsn() yet.
>
> Also I think that the DEBUG message in LogCurrentRunningXacts()
>
> "
> elog(DEBUG2,
> "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
> CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
> LSN_FORMAT_ARGS(recptr),
> CurrRunningXacts->oldestRunningXid,
> CurrRunningXacts->latestCompletedXid,
> CurrRunningXacts->nextXid);
> "
>
> should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
> visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
> released,
>

I think the new LSN can be visible only when the corresponding WAL is
written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
make it visible. In your experiment below, isn't it possible that in
the meantime WAL writer has written that WAL due to which you are
seeing an updated location?

>see:
>
> \watch on Session 1 provides:
>
> pg_current_wal_lsn
> --------------------
> 0/87D110
> (1 row)
>
> Until:
>
> Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579
> 2579 XLogRecPtr WriteRqstPtr = asyncXactLSN;
> (gdb) n
> 2581 bool wakeup = false;
> (gdb)
> 2584 SpinLockAcquire(&XLogCtl->info_lck);
> (gdb)
> 2585 RefreshXLogWriteResult(LogwrtResult);
> (gdb)
> 2586 sleeping = XLogCtl->WalWriterSleeping;
> (gdb)
> 2587 prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> (gdb)
> 2588 if (XLogCtl->asyncXactLSN < asyncXactLSN)
> (gdb)
> 2589 XLogCtl->asyncXactLSN = asyncXactLSN;
> (gdb)
> 2590 SpinLockRelease(&XLogCtl->info_lck);
> (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> $1 = 0x87d148
>
> Then session 1 provides:
>
> pg_current_wal_lsn
> --------------------
> 0/87D148
> (1 row)
>
> So, when we see in the log:
>
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740 latest complete 739 next xid 740)
> 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
>
> It's indeed possible that the new LSN was not visible yet (spinlock not released?)
> before the query began (because we can not rely on the time the DEBUG message has
> been emitted).
>
> > Note that the reason why
> > walsender has picked the running_xact written by background writer is
> > because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > I think we can probably try to reproduce manually via debugger.
> >
> > If this theory is correct
>
> It think it is.
>
> > then I think we will need to use injection
> > points to control the behavior of bgwriter or use the slots created
> > via SQL API for syncing in tests.
> >
> > Thoughts?
>
> I think that maybe as a first step we should move the "elog(DEBUG2," message as
> proposed above to help debugging (that could help to confirm the above theory).
>

I think I am missing how exactly moving DEBUG2 can confirm the above theory.

> If the theory is proven then I'm not sure we need the extra complexity of
> injection point here, maybe just relying on the slots created via SQL API could
> be enough.
>

Yeah, that could be the first step. We can probably add an injection
point to control the bgwrite behavior and then add tests involving
walsender performing the decoding. But I think it is important to have
sufficient tests in this area as I see they are quite helpful in
uncovering the issues.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-04-06 05:00:25 Re: remaining sql/json patches
Previous Message Tom Lane 2024-04-06 04:05:28 Re: Statistics Import and Export