Re: detailed error message of pg_waldump

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: detailed error message of pg_waldump
Date: 2021-06-16 08:35:58
Message-ID: 20210616.173558.1189768841998532858.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks!

At Wed, 16 Jun 2021 16:52:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in
> On Fri, Jun 4, 2021 at 5:35 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > In a very common operation of accidentally specifying a recycled
> > segment, pg_waldump often returns the following obscure message.
> >
> > $ pg_waldump 00000001000000000000002D
> > pg_waldump: fatal: could not find a valid record after 0/2D000000
> >
> > The more detailed message is generated internally and we can use it.
> > That looks like the following.
> >
> > $ pg_waldump 00000001000000000000002D
> > pg_waldump: fatal: unexpected pageaddr 0/24000000 in log segment 00000001000000000000002D, offset 0
> >
> > Is it work doing?
>
> Perhaps we need both? The current message describes where the error
> happened and the message internally generated describes the details.
> It seems to me that both are useful. For example, if we find an error
> during XLogReadRecord(), we show both as follows:
>
> if (errormsg)
> fatal_error("error in WAL record at %X/%X: %s",
> LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
> errormsg);

Yeah, I thought that it might be a bit vervous and lengty but actually
we have another place where doing that. One more point is whether we
have a case where first_record is invalid but errormsg is NULL
there. WALDumpReadPage immediately exits so we should always have a
message in that case according to the comment in ReadRecord.

> * We only end up here without a message when XLogPageRead()
> * failed - in that case we already logged something. In
> * StandbyMode that only happens if we have been triggered, so we
> * shouldn't loop anymore in that case.

So that can be an assertion.

Now the messages looks like this.

$ pg_waldump /home/horiguti/data/data_work/pg_wal/000000020000000000000010
pg_waldump: fatal: could not find a valid record after 0/0: unexpected pageaddr 0/9000000 in log segment 000000020000000000000010, offset 0

reagards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pg_waldump_detailed_error_2.patch text/x-patch 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-06-16 08:48:20 Re: Signed vs. Unsigned (some)
Previous Message Andrey Borodin 2021-06-16 08:17:26 Re: Different compression methods for FPI