Quick Links

Re: Fix pg_waldump to exit cleanly at end of WAL

From:	Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Fix pg_waldump to exit cleanly at end of WAL
Date:	2025-09-03 03:20:17
Message-ID:	C84421F1-DB4E-423E-BA95-129A249AADAB@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Fujii and Michael,

Thanks for your comments.

> On Sep 3, 2025, at 10:47, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Sep 03, 2025 at 09:11:15AM +0900, Fujii Masao wrote:
>> Can pg_waldump really distinguish between the end of WAL and corruption?
>
> I don't think you can really do that reliably, as some of the messages
> marking the end of WAL could also be bumped into upon a corruption, as
> far as I recall. We need the CRC record check to make the
> distinction, which we cannot do at this stage because we don't have
> the full record yet for the check.
>
> Perhaps what's been posted on your thread [1] could be revisited for
> the xlogreader because we are able to read the record headers more
> reliably thanks to Thomas' work around bae868caf222, backtracking on
> my previous take posted here, posted prior to this commit:
> https://www.postgresql.org/message-id/ZadmUE-edk2Z4CQU@paquier.xyz
>
>

My theory is like:

WAL file has no septic “end of WAL record” marker. It purely depends on “xl_tot_len” to decide edge of current WAL record and start next WAL record.

Based on the code comment in xlogreader.c:

/*
* Read the record length.
*
* NB: Even though we use an XLogRecord pointer here, the whole record
* header might not fit on this page. xl_tot_len is the first field of the
* struct, so it must be on this page (the records are MAXALIGNed), but we
* cannot access any other fields until we've verified that we got the
* whole header.
*/
record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
total_len = record->xl_tot_len;

As “xl_tot_len” can always be read from the current page, it is reliable. Then if “xl_tot_len” is 0, that can be considered as a “end marker” of WAL.

If WAL happens to corrupt and xl_tot_len is overwritten to 0, then the WAL chain is broken, but the possibility should be very low because WAL corruption possibility is low plus that, even if WAL corrupts, xl_tot_len may be overwritten a random value, thus possibly of 0 is even lower.

But yes, we are still not 100% sure if that is “end of WAL” or a corruption. So maybe we can simply take Tom’s suggestion to change the log message to “reached apparent end of WAL stream”, which don’t lose the error hint, and make the message less scary, which is a small enhancement.

One thing I am not sure is the error message change would break callers. pg_waldump will just print the error message. For xlogrecovery.c, I did a quick test, looks like it just eats the error message:

```
2025-09-03 10:46:48.492 CST [52426] LOG: starting archive recovery
2025-09-03 10:46:48.495 CST [52426] LOG: consistent recovery state reached at 0/017AAA90
2025-09-03 10:46:48.495 CST [52420] LOG: database system is ready to accept read-only connections
2025-09-03 10:46:48.495 CST [52426] LOG: redo starts at 0/017AAA90
2025-09-03 10:46:48.496 CST [52426] LOG: redo done at 0/017AF398 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-09-03 10:46:48.496 CST [52426] LOG: last completed transaction was at log time 2025-09-03 10:42:02.901807+08
```

So, I guess xlogreader may return a different log message when xl_tot_len is 0. Please correct me if my understanding is wrong.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Re: Fix pg_waldump to exit cleanly at end of WAL at 2025-09-03 02:47:06 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chao Li	2025-09-03 03:56:10	Re: SQL:2023 JSON simplified accessor support
Previous Message	Hayato Kuroda (Fujitsu)	2025-09-03 03:11:12	RE: POC: enable logical decoding when wal_level = 'replica' without a server restart