Re: WAL page magic errors (and plenty others) got hard to debug.

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: WAL page magic errors (and plenty others) got hard to debug.
Date: 2020-04-21 08:08:31
Message-ID: 20200421080831.nv4jmhxpyyqa5bfe@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-05 15:49:16 -0700, Andres Freund wrote:
> When starting with on a data directory with an older WAL page magic we
> currently make that hard to debug. E.g.:
>
> 2020-04-05 15:31:04.314 PDT [1896669][:0] LOG: database system was shut down at 2020-04-05 15:24:56 PDT
> 2020-04-05 15:31:04.314 PDT [1896669][:0] LOG: invalid primary checkpoint record
> 2020-04-05 15:31:04.314 PDT [1896669][:0] PANIC: could not locate a valid checkpoint record
> 2020-04-05 15:31:04.315 PDT [1896668][:0] LOG: startup process (PID 1896669) was terminated by signal 6: Aborted
> 2020-04-05 15:31:04.315 PDT [1896668][:0] LOG: aborting startup due to startup process failure
> 2020-04-05 15:31:04.316 PDT [1896668][:0] LOG: database system is shut down
>
> As far as I can tell this is not just the case for a wrong page magic,
> but for all page level validation errors.
>
> I think this largely originates in:
>
> commit 0668719801838aa6a8bda330ff9b3d20097ea844
> Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: 2018-05-05 01:34:53 +0300
>
> Fix scenario where streaming standby gets stuck at a continuation record.

Heikki, Kyotaro, it'd be good if you could comment on what motivated
this approach. Because it sure as hell hides a lot of useful information
when there's a problem with WAL. Or well, all information.

- Andres

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-21 08:15:31 Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2
Previous Message Kyotaro Horiguchi 2020-04-21 08:04:27 Re: Remove page-read callback from XLogReaderState.