Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: alvherre(at)alvh(dot)no-ip(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
Date: 2022-09-27 03:01:25
Message-ID: 20220927.120125.579639936942345624.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 20 Sep 2022 17:40:36 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Tue, Sep 20, 2022 at 12:57 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > On 2022-Sep-19, Bharath Rupireddy wrote:
> >
> > > We have a bunch of messages [1] that have an offset, but not LSN in
> > > the error message. Firstly, is there an easiest way to figure out LSN
> > > from offset reported in the error messages? If not, is adding LSN to
> > > these messages along with offset a good idea? Of course, we can't just
> > > convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
> > > something meaningful like reporting the LSN of the page that we are
> > > reading-in or writing-out etc.
> >
> > Maybe add errcontext() somewhere that reports the LSN would be
> > appropriate. For example, the page_read() callbacks have the LSN
> > readily available, so the ones in backend could install the errcontext
> > callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND. Not
> > sure what is best of those options, but either of those sounds better
> > than sticking the LSN in a lower-level routine that doesn't necessarily
> > have the info already.
>
> All of the error messages [1] have the LSN from which offset was
> calculated, I think we can just append that to the error messages
> (something like ".... offset %u, LSN %X/%X: %m") and not complicate
> it. Thoughts?

If all error-emitting site knows the LSN, we don't need the context
message. But *I* would like that the additional message looks like
"while reading record at LSN %X/%X" or slightly shorter version of
it. Because the targetRecPtr is the beginning of the current reading
record, not the LSN for the segment and offset. It may point to past
segments.

> [1]
> errmsg("could not read from WAL segment %s, offset %u: %m",
> errmsg("could not read from WAL segment %s, offset %u: %m",
> errmsg("could not write to log file %s "
> "at offset %u, length %zu: %m",
> errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
> errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
> pg_log_error("received write-ahead log record for offset %u with no file open",
> "invalid magic number %04X in WAL segment %s, offset %u",
> "invalid info bits %04X in WAL segment %s, offset %u",
> "invalid info bits %04X in WAL segment %s, offset %u",
> "unexpected pageaddr %X/%X in WAL segment %s, offset %u",
> "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-09-27 03:04:36 Re: Add hint about downloadable logs to CI README
Previous Message James Coleman 2022-09-27 02:58:16 Re: cirrus-ci cross-build interactions?