Quick Links

Re: Propagate XLogFindNextRecord error to callers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Japin Li <japinli(at)hotmail(dot)com>
Subject:	Re: Propagate XLogFindNextRecord error to callers
Date:	2026-03-17 09:48:46
Message-ID:	CAHGQGwFSLQZQ+GVnpNDje48D2zpEQQ3nxodHw7Obt0-PmzQNFg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 26, 2026 at 5:19 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>
> Thanks for the comments!
>
> On Tue, Feb 24, 2026 at 5:00 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> > From a design perspective, I’m not sure we need to add a new errormsg parameter to XLogFindNextRecord(). The new parameter ultimately just exposes state->errormsg_buf, so the returned errormsg implicitly depends on the lifetime of state, and we also need extra handling for cases like errormsg == NULL.
> >
> > Instead, perhaps we could add a helper function, say XLogReaderGetLastError(XLogReaderState *state). which internally pstrdup()s state->errormsg_buf (after checking errormsg_deferred, etc.). That way the caller owns the returned string explicitly, and there’s no hidden dependency on the reader state’s lifetime.
> >
> > This would also avoid changing the XLogFindNextRecord() signature while making the ownership semantics clearer.
>
> One issue I see is that it introduces another way to get the error,
> with XLogReadRecord and XLogNextRecord using an errormsg parameter,
> and XLogFindNextRecord using the helper function. Maybe the solution
> would be to change both XLogReadRecord and XLogNextRecord to use this
> new function to stay consistent, but that means changing their
> signatures.
>
> Also, I see the errormsg parameter as a way to signal the caller that
> "this function can fail, the detailed error will be available here".
> With the XLogReaderGetLastError, it becomes the caller's
> responsibility to know which function may fill the error message and
> check it accordingly.
>
> The error message is likely printed shortly after the function's call,
> so I suspect the risk of using the errormsg after its intended
> lifetime is low.
>
> You bring up a good point about the errormsg's lifetime, which is
> definitely something to mention in the function's comments. I've
> updated the patch with the additional comments.

Since this patch is marked Ready for Committer, I've started reviewing it.

+ * When set, *errormsg points to an internal buffer that's valid until the next
+ * call to XLogReadRecord.

Could that buffer also be invalidated by other functions that modify
"XLogReaderState *state", such as XLogBeginRead()?

+# Wrong WAL version. We copy an existing wal file and set the
+# page's magic value to 0000.

Would it be better to describe the purpose of this test at the top?
For example:

# Test that pg_waldump reports a detailed error message when dumping
# a WAL file with an invalid magic number (0000).
{
# The broken WAL file is created by copying a valid WAL file and
# overwriting its magic number with 0000.
my $broken_wal_dir = PostgreSQL::Test::Utils::tempdir_short();

+ open($fh, '+<', $broken_wal);
+ close($fh);

Should we add error handling like:

open(my $fh, '+<', $broken_wal)
or BAIL_OUT("open($broken_wal) failed: $!");
close($fh)
or BAIL_OUT("close failed: $!");

Also, other similar tests seem to call binmode. Is it really unnecessary
in this case?

Regards,

--
Fujii Masao

In response to

Re: Propagate XLogFindNextRecord error to callers at 2026-02-26 08:19:09 from Anthonin Bonnefoy

Responses

Re: Propagate XLogFindNextRecord error to callers at 2026-03-23 08:15:13 from Anthonin Bonnefoy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2026-03-17 09:52:49	pgsql: Don't leave behind files in src dir in 007_multixact_conversion.
Previous Message	Marco Nenciarini	2026-03-17 09:31:23	Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery