Re: Corruption during WAL replay

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, deniel1495(at)mail(dot)ru, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, tejeswarm(at)hotmail(dot)com, hlinnaka <hlinnaka(at)iki(dot)fi>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Wood <hexexpert(at)comcast(dot)net>
Subject: Re: Corruption during WAL replay
Date: 2022-03-25 05:26:54
Message-ID: 20220325052654.3xpbmntatyofau2w@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-03-24 21:54:38 -0700, Andres Freund wrote:
> I do see that the LSN that ends up on the page is the same across a few runs
> of the test on serinus. Which presumably differs between different
> animals. Surprised that it's this predictable - but I guess the run is short
> enough that there's no variation due to autovacuum, checkpoints etc.

This actually explains how the issue could start to be visible with
ce95c543763. It changes the amount of WAL initdb generates and therefore
influences what LSN the page ends up with. I've verified that the failing
test is reproducible with ce95c543763, but not its parent 7dac61402e3. While
of course ce95c543763 isn't "actually responsible".

Ah, and that's finally also the explanation why I couldn't reproduce the
failure it in a different directory, with an otherwise identically configured
PG: The length of the path to the tablespace influences the size of the
XLOG_TBLSPC_CREATE record.

Not sure what to do here... I guess we can just change the value we overwrite
the page with and hope to not hit this again? But that feels deeply deeply
unsatisfying.

Perhaps it would be enough to write into multiple parts of the page? I am very
much not a cryptographical expert, but the way pg_checksum_block() works, it
looks to me that "multiple" changes within a 16 byte chunk have a smaller
influence on the overall result than the same "amount" of changes to separate
16 byte chunks.

I might have to find a store still selling strong beverages at this hour.

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-25 05:34:45 Re: Corruption during WAL replay
Previous Message wangw.fnst@fujitsu.com 2022-03-25 05:23:05 RE: Logical replication timeout problem