Re: pg_walinspect: ReadNextXLogRecord's first_record argument

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_walinspect: ReadNextXLogRecord's first_record argument
Date: 2022-08-17 04:40:48
Message-ID: CALj2ACUMBOM=f6PSH=J0kHsWE7psPKmRDEZ+ZfahsP=8-_bYeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 16, 2022 at 10:04 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Hi,
>
> I was looking at the code for pg_walinspect today and I think I may
> have found a bug (or else I'm confused about how this all works, which
> is also possible). ReadNextXLogRecord() takes an argument first_record
> of type XLogRecPtr which is used only for error reporting purposes: if
> we fail to read the next record for a reason other than end-of-WAL, we
> complain that we couldn't read the WAL at the LSN specified by
> first_record.
>
> ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(),
> we're just reading a single record, and the LSN passed as first_record
> is the LSN at which that record starts. Cool. But in the other two
> callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple
> records, and first_record is always passed as the LSN of the first
> record. That's logical enough given the name of the argument, but the
> effect of it seems to be that an error while reading any of the
> records will be reported using the LSN of the first record, which does
> not seem right.

Indeed. Thanks a lot for finding it.

> By contrast, pg_rewind's extractPageMap() reports the error using
> xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader
> function that we're calling here is XLogReadRecord(), which in turn
> calls XLogNextRecord(), which has this comment:
>
> /*
> * state->EndRecPtr is expected to have been set by the last call to
> * XLogBeginRead() or XLogNextRecord(), and is the location of the
> * error.
> */
>
> Thoughts?

Agreed.

Here's a patch (for V15 as well) fixing this bug, please review.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment Content-Type Size
v1-master-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch application/octet-stream 5.6 KB
v1-PG15-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-08-17 04:48:14 Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Previous Message Dilip Kumar 2022-08-17 04:02:07 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints