Quick Links

Re: has_wal_read_bug

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: has_wal_read_bug
Date:	2022-05-17 07:15:35
Message-ID:	20220517071535.GB2792153@rfd.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, May 17, 2022 at 11:50:51AM +1200, Thomas Munro wrote:
> 027_stream_regress.pl has:
>
> if (PostgreSQL::Test::Utils::has_wal_read_bug)
> {
> # We'd prefer to use Test::More->builder->todo_start, but the bug causes
> # this test file to die(), not merely to fail.
> plan skip_all => 'filesystem bug';
> }
>
> Is the die() referenced there the one from the system_or_bail() call
> that commit a096813b got rid of?

No, it was the 'croak "timed out waiting for catchup"',
e.g. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2022-01-25%2016%3A56%3A26

> Here's a failure in 031_recovery_conflict.pl that smells like
> concurrent pread() corruption:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2022-05-16%2015%3A45%3A54
>
> 2022-05-16 18:10:33.375 CEST [52106:1] LOG: started streaming WAL
> from primary at 0/3000000 on timeline 1
> 2022-05-16 18:10:33.621 CEST [52105:5] LOG: incorrect resource
> manager data checksum in record at 0/338FDC8
> 2022-05-16 18:10:33.622 CEST [52106:2] FATAL: terminating walreceiver
> process due to administrator command

Agreed. Here, too:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2022-05-09%2015%3A46%3A03

> Presumably we also need the has_wal_read_bug kludge in all these new
> tests that use replication.

That is an option. One alternative is to reconfigure those three animals to
remove --enable-tap-tests. Another alternative is to make the build system
skip all files of all TAP suites on affected systems. Handling this on a
file-by-file basis seemed reasonable to me when only two files had failed that
way. Now, five files have failed. We have wait_for_catchup calls in
fifty-one files, and I wouldn't have chosen the has_wal_read_bug approach if I
had expected fifty-one files to eventually call it. I could tolerate it,
though.

In response to

has_wal_read_bug at 2022-05-16 23:50:51 from Thomas Munro

Responses

Re: has_wal_read_bug at 2022-10-30 03:16:39 from Noah Misch

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro Horiguchi	2022-05-17 07:27:19	amcheck is using a wrong macro to check compressed-ness
Previous Message	Michael Paquier	2022-05-17 07:11:40	Re: [PATCH] Completed unaccent dictionary with many missing characters