Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint
Date: 2019-03-24 12:47:58
Message-ID: 20190324124758.GC2558@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Sat, Mar 23, 2019 at 04:08:42PM -0700, Peter Geoghegan wrote:
> Seems like there might be a problem either caused by or detected by
> 016_min_consistency.pl on piculet:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-03-23%2022%3A28%3A59

Interesting. Based on what regress_log_016_min_consistency tells,
the test attempts to stop the standby in fast mode but it fails
because of a timeout:
### Stopping node "standby" using mode fast
[...]
pg_ctl: server does not shut down
Bail out! system pg_ctl failed

There is only one place in the tests where that happens, and before
attempting to stop the standby we issue a checkpoint on it with its
primary killed:
# Issue a restart point on the standby now, which makes the checkpointer
# update minRecoveryPoint.
$standby->safe_psql('postgres', 'CHECKPOINT;');
[...]
$primary->stop('immediate');
$standby->stop('fast');

The failure is a bit weird, as I would expect all those three actions
to be sequential. piculet is the only failure happening on the
buildfarm and it uses --disable-atomics, so I am wondering if that is
related and if 0dfe3d0 is part of that. With a primary/standby set,
it could be possible to test that scenario pretty easily. I'll give
it a shot.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-03-24 13:02:18 Re: Error message inconsistency
Previous Message Michael Paquier 2019-03-24 12:26:45 Re: current_logfiles not following group access and instead follows log_file_mode permissions