A failure in 031_recovery_conflict.pl on Debian/s390x

From: Christoph Berg <myon(at)debian(dot)org>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: A failure in 031_recovery_conflict.pl on Debian/s390x
Date: 2023-08-04 12:43:29
Message-ID: ZMzycdG5IWCM73ll@msg.df7cb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Re: Noah Misch
> On Tue, Jul 25, 2023 at 01:56:41PM +0530, Bharath Rupireddy wrote:
> > I've observed the following failure once in one of my Cirrus CI runs
> > on Windows Server on HEAD:
> >
> > timed out waiting for match: (?^:User was holding shared buffer pin
> > for too long) at
> > C:/cirrus/src/test/recovery/t/031_recovery_conflict.pl line 318.
> > # Postmaster PID for node "primary" is 696
> >
> > https://api.cirrus-ci.com/v1/artifact/task/5272062399348736/testrun/build/testrun/recovery/031_recovery_conflict/log/regress_log_031_recovery_conflict
> >
> > https://github.com/BRupireddy/postgres/runs/15296698158
>
> Known:
> https://postgr.es/m/flat/20220409045515.35ypjzddp25v72ou%40alap3.anarazel.de
> https://postgr.es/m/flat/CA+hUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO=nEYKSg(at)mail(dot)gmail(dot)com
>
> A good next step would be patch "Fix recovery conflict SIGUSR1 handling" from
> https://postgr.es/m/CA+hUKG+Hi5P1j_8cVeqLLwNSVyJh4RntF04fYWkeNXfTrH2MYA@mail.gmail.com
> (near the end of that second thread).

I am also seeing problems with t/031_recovery_conflict.pl, on the
newly added s390x architecture on apt.postgresql.org. The failure is
flaky, but has been popping up in build logs often enough.

I managed to reproduce it on the shell by running the test in a loop a
few times. The failure looks like this:

echo "# +++ tap check in src/test/recovery +++" && rm -rf '/home/myon/postgresql/pg/postgresql/build/src/test/recovery'/tmp_check && /bin/mkdir -p '/home/myon/postgresql/pg/postgresql/build/src/test/recovery'/tmp_check && cd /home/myon/postgresql/pg/postgresql/build/../src/test/recovery && TESTLOGDIR='/home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log' TESTDATADIR='/home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check' PATH="/home/myon/postgresql/pg/postgresql/build/tmp_install/usr/lib/postgresql/17/bin:/home/myon/postgresql/pg/postgresql/build/src/test/recovery:$PATH" LD_LIBRARY_PATH="/home/myon/postgresql/pg/postgresql/build/tmp_install/usr/lib/s390x-linux-gnu" PGPORT='65432' top_builddir='/home/myon/postgresql/pg/postgresql/build/src/test/recovery/../../..' PG_REGRESS='/home/myon/postgresql/pg/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress' /usr/bin/prove -I /home/myon/postgresql/pg/postgresql/build/../src/test/perl/ -I /home/myon/postgresql/pg/postgresql/build/../src/test/recovery --verbose t/031_recovery_conflict.pl
# +++ tap check in src/test/recovery +++
t/031_recovery_conflict.pl ..
# issuing query via background psql:
# BEGIN;
# DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
# FETCH FORWARD FROM test_recovery_conflict_cursor;
ok 1 - buffer pin conflict: cursor with conflicting pin established
ok 2 - buffer pin conflict: logfile contains terminated connection due to recovery conflict
ok 3 - buffer pin conflict: stats show conflict on standby
# issuing query via background psql:
# BEGIN;
# DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
# FETCH FORWARD FROM test_recovery_conflict_cursor;
#
ok 4 - snapshot conflict: cursor with conflicting snapshot established
ok 5 - snapshot conflict: logfile contains terminated connection due to recovery conflict
ok 6 - snapshot conflict: stats show conflict on standby
# issuing query via background psql:
# BEGIN;
# LOCK TABLE test_recovery_conflict_table1 IN ACCESS SHARE MODE;
# SELECT 1;
#
ok 7 - lock conflict: conflicting lock acquired
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 7.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 7 subtests passed

Test Summary Report
-------------------
t/031_recovery_conflict.pl (Wstat: 65280 Tests: 7 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
Files=1, Tests=7, 186 wallclock secs ( 0.01 usr 0.00 sys + 0.74 cusr 0.71 csys = 1.46 CPU)
Result: FAIL
make: *** [Makefile:23: check] Error 1

I.e. the test file just exits after test 7, without running the rest.

Is it dying because it's running into a deadlock instead of some other
expected error message?

timed out waiting for match: (?^:User was holding a relation lock for too long) at t/031_recovery_conflict.pl line 318.
2023-08-04 12:20:38.061 UTC [1920073] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery

Christoph

Attachment Content-Type Size
031_recovery_conflict_primary.log text/plain 8.1 KB
031_recovery_conflict_standby.log text/plain 7.0 KB
regress_log_031_recovery_conflict text/plain 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-08-04 12:54:51 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Amit Langote 2023-08-04 12:21:49 Re: remaining sql/json patches