Re: Corruption during WAL replay

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, deniel1495(at)mail(dot)ru, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, tejeswarm(at)hotmail(dot)com, Andres Freund <andres(at)anarazel(dot)de>, hlinnaka <hlinnaka(at)iki(dot)fi>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Wood <hexexpert(at)comcast(dot)net>
Subject: Re: Corruption during WAL replay
Date: 2022-03-25 01:22:38
Message-ID: 3170060.1648171358@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I hate to say "no" because the evidence suggests that the answer might
> be "yes" -- but it definitely isn't intending to change anything about
> the shutdown sequence. It just introduces a mechanism to backends to
> force the checkpointer to delay writing the checkpoint record.

Wait a minute, I think we may be barking up the wrong tree.

The three commits that serinus saw as new in its first failure were

ce95c54376 Thu Mar 24 20:33:13 2022 UTC Fix pg_statio_all_tables view for multiple TOAST indexes.
7dac61402e Thu Mar 24 19:51:40 2022 UTC Remove unused module imports from TAP tests
412ad7a556 Thu Mar 24 18:52:28 2022 UTC Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.

I failed to look closely at dragonet, but I now see that its
first failure saw

ce95c54376 Thu Mar 24 20:33:13 2022 UTC Fix pg_statio_all_tables view for multiple TOAST indexes.
7dac61402e Thu Mar 24 19:51:40 2022 UTC Remove unused module imports from TAP tests

serinus is 0-for-3 since then, and dragonet 0-for-4, so we can be pretty
confident that the failure is repeatable for them. That means that the
culprit must be ce95c54376 or 7dac61402e, not anything nearby such as
412ad7a556.

It's *really* hard to see how the pg_statio_all_tables change could
have affected this. So that leaves 7dac61402e, which did this to
the test script that's failing:

use strict;
use warnings;
-use Config;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;

Discuss.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-25 01:23:47 Re: Corruption during WAL replay
Previous Message Yugo NAGATA 2022-03-25 01:12:18 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors