RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-04-08 06:48:57
Message-ID: OS0PR01MB57162C119597F999895E9DAA94002@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, April 6, 2024 12:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > Thinking more on this, it doesn't seem related to
> > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't
> > > change any locking or something like that which impacts write positions.
> >
> > Agree.
> >
> > > I think what has happened here is that running_xact record written
> > > by the background writer [1] is not written to the kernel or disk
> > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > > current_lsn to be compared with replayed LSN.
> >
> > Agree, I think it's not visible through pg_current_wal_lsn() yet.
> >
> > Also I think that the DEBUG message in LogCurrentRunningXacts()
> >
> > "
> > elog(DEBUG2,
> > "snapshot of %d+%d running transaction ids (lsn %X/%X oldest
> xid %u latest complete %u next xid %u)",
> > CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
> > LSN_FORMAT_ARGS(recptr),
> > CurrRunningXacts->oldestRunningXid,
> > CurrRunningXacts->latestCompletedXid,
> > CurrRunningXacts->nextXid); "
> >
> > should be located after the XLogSetAsyncXactLSN() call. Indeed, the
> > new LSN is visible after the spinlock (XLogCtl->info_lck) in
> > XLogSetAsyncXactLSN() is released,
> >
>
> I think the new LSN can be visible only when the corresponding WAL is written
> by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it
> visible. In your experiment below, isn't it possible that in the meantime WAL
> writer has written that WAL due to which you are seeing an updated location?
>
> >see:
> >
> > \watch on Session 1 provides:
> >
> > pg_current_wal_lsn
> > --------------------
> > 0/87D110
> > (1 row)
> >
> > Until:
> >
> > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at
> xlog.c:2579
> > 2579 XLogRecPtr WriteRqstPtr = asyncXactLSN;
> > (gdb) n
> > 2581 bool wakeup = false;
> > (gdb)
> > 2584 SpinLockAcquire(&XLogCtl->info_lck);
> > (gdb)
> > 2585 RefreshXLogWriteResult(LogwrtResult);
> > (gdb)
> > 2586 sleeping = XLogCtl->WalWriterSleeping;
> > (gdb)
> > 2587 prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> > (gdb)
> > 2588 if (XLogCtl->asyncXactLSN < asyncXactLSN)
> > (gdb)
> > 2589 XLogCtl->asyncXactLSN = asyncXactLSN;
> > (gdb)
> > 2590 SpinLockRelease(&XLogCtl->info_lck);
> > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> > $1 = 0x87d148
> >
> > Then session 1 provides:
> >
> > pg_current_wal_lsn
> > --------------------
> > 0/87D148
> > (1 row)
> >
> > So, when we see in the log:
> >
> > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> > snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740
> > latest complete 739 next xid 740)
> > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:
> statement: SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
> >
> > It's indeed possible that the new LSN was not visible yet (spinlock
> > not released?) before the query began (because we can not rely on the
> > time the DEBUG message has been emitted).
> >
> > > Note that the reason why
> > > walsender has picked the running_xact written by background writer
> > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > > I think we can probably try to reproduce manually via debugger.
> > >
> > > If this theory is correct
> >
> > It think it is.
> >
> > > then I think we will need to use injection points to control the
> > > behavior of bgwriter or use the slots created via SQL API for
> > > syncing in tests.
> > >
> > > Thoughts?
> >
> > I think that maybe as a first step we should move the "elog(DEBUG2,"
> > message as proposed above to help debugging (that could help to confirm
> the above theory).
> >
>
> I think I am missing how exactly moving DEBUG2 can confirm the above theory.
>
> > If the theory is proven then I'm not sure we need the extra complexity
> > of injection point here, maybe just relying on the slots created via
> > SQL API could be enough.
> >
>
> Yeah, that could be the first step. We can probably add an injection point to
> control the bgwrite behavior and then add tests involving walsender
> performing the decoding. But I think it is important to have sufficient tests in
> this area as I see they are quite helpful in uncovering the issues.

Here is the patch to drop the subscription in the beginning so that the
restart_lsn of the lsub1_slot won't be advanced due to concurrent
xl_running_xacts from bgwriter. The subscription will be re-created after all
the slots are sync-ready. I think maybe we can use this to stabilize the test
as a first step and then think about how to make use of injection point to add
more tests if it's worth it.

Best Regards,
Hou zj

Attachment Content-Type Size
v1-0001-stablize-the-test.patch application/octet-stream 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2024-04-08 07:17:51 Re: Table AM Interface Enhancements
Previous Message vignesh C 2024-04-08 06:40:02 Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE