Re: pgsql: Improve runtime and output of tests for replication slots checkp

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Improve runtime and output of tests for replication slots checkp
Date: 2025-06-23 12:53:25
Message-ID: CALDaNm1t0PZUjvAsJ3jRK_S49OmmLhqWjQC+jdidxBmZTj1FXg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On Sun, 22 Jun 2025 at 05:46, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Sat, Jun 21, 2025 at 2:42 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > > And I see the following variable values.
> >
> > > (lldb) p/x targetPagePtr
> > > (XLogRecPtr) 0x0000000029004000
> > > (lldb) p/x RecPtr
> > > (XLogRecPtr) 0x0000000029002138
> >
> > > I hardly understand how is this possible given it was compiled with "-O0".
> > > I'm planning to continue investigating this tomorrow.
> >
> > Yeah, I see
> >
> > (lldb) p/x targetPagePtr
> > (XLogRecPtr) 0x0000000029004000
> > (lldb) p/x RecPtr
> > (XLogRecPtr) 0x0000000029002138
> > (lldb) p/x RecPtr - (RecPtr % 8192)
> > (XLogRecPtr) 0x0000000029002000
> >
> > We're here:
> >
> > /* Calculate pointer to beginning of next page */
> > targetPagePtr += XLOG_BLCKSZ;
> >
> > /* Wait for the next page to become available */
> > readOff = ReadPageInternal(state, targetPagePtr,
> > Min(total_len - gotlen + SizeOfXLogShortPHD,
> > XLOG_BLCKSZ));
> >
> > so that's where the increment of targetPagePtr came from.
> > But "Wait for the next page to become available" seems awfully
> > trusting that there will be another page. Should this be
> > using the no-wait code path?
>
> Thank you for the help. It seems to me that problem is deeper. The
> code seems to only trying to read till the end of given WAL record,
> but can't reach it. According to the values I've seen in XLogCtl, it
> seems that RedoRecPtr points somewhere inside of that record's body.
> I don't feel confident about to understand what's going on and how to
> fix it.
>
> I've tried two things.
> 1) slot_tests_wait_for_checkpoint.patch
> Make tests wait for checkpoint completion (as I think they were
> originally intended). However, the problem still persists.
> 2) revert_slot_last_saved_restart_lsn.patch
> Revert ca307d5cec90 and make new tests reserve WAL using wal_keep_size
> GUC. The problem still persists. It seems to be some problem
> independent to my attempts to fix retaining WAL files with slot's
> restart_lsn. The new tests just spotted the existing bug.

I was able to reproduce this issue after a few runs. At the time of
failure, the last few WAL records were as follows:
mgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn:
0/290020C0, prev 0/29002088, desc: CHECKPOINT_ONLINE redo 0/29002068;
tli 1; prev tli 1; fpw true; wal_level logical; xid 0:753; oid 24576;
multi 1; offset 0; oldest xid 744 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 753;
online
rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn:
0/29004018, prev 0/290020C0, desc: OVERWRITE_CONTRECORD lsn
0/29002138; time 2025-06-23 11:28:22.748866 IST
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn:
0/29004048, prev 0/29004018, desc: CHECKPOINT_SHUTDOWN redo
0/29004048; tli 1; prev tli 1; fpw true; wal_level logical; xid 0:753;
oid 24576; multi 1; offset 0; oldest xid 744 in DB 1; oldest multi 1
in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid
0; shutdown
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn:
0/290040C0, prev 0/29004048, desc: RUNNING_XACTS nextXid 753
latestCompletedXid 752 oldestRunningXid 753

When trying to read the CHECKPOINT_ONLINE record at LSN 0/290020C0,
the system determines that the total record length is 8133 bytes, we
can get these values from XLogDecodeNextRecord function:
(gdb) p total_len
$111 = 8133

Based on this, it expects to read 277 bytes from the next page:
(gdb) p gotlen
$112 = 7880 # Data available in the current page
(gdb) p SizeOfXLogShortPHD
$113 = 24
(gdb) p total_len - gotlen + SizeOfXLogShortPHD
$115 = 277 # Data expected from the next page

However, in the read_local_xlog_page_guts function, it finds that WAL
data is available only up to 0x290040F8, whereas it needs data up to
0x29004115. As a result, it continuously waits for more WAL data to
appear.

One thing I'm not sure about is why the CHECKPOINT_ONLINE record is
reported as requiring 8133 bytes. This seems unusually large for a
checkpoint record, and I plan to investigate this further. Thoughts?
Another key point is the presence of an OVERWRITE_CONTRECORD in the
next page. This indicates that the record at 0/29002138 should be
skipped as a partial record is written. However, this skip is
currently not happening, leading to the wait.
This appears to be a defect in how OVERWRITE_CONTRECORD is handled —
it fails to recognize that an OVERWRITE_CONTRECORD has been written,
indicating the current record is no longer valid and should be
skipped. As a result, it continues waiting for WAL data based on an
invalid record.

Regards,
Vignesh

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Amit Kapila 2025-06-23 13:33:14 Re: pgsql: Improve runtime and output of tests for replication slots checkp
Previous Message Alexander Korotkov 2025-06-23 12:31:25 Re: pgsql: Improve runtime and output of tests for replication slots checkp