Race conditions in 019_replslot_limit.pl

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Race conditions in 019_replslot_limit.pl
Date: 2022-02-15 21:29:20
Message-ID: 83b46e5f-2a52-86aa-fa6c-8174908174b8@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While looking at recent failures in the new 028_pitr_timelines.pl
recovery test, I noticed that there have been a few failures in the
buildfarm in the recoveryCheck phase even before that, in the
019_replslot_limit.pl test.

For example:


[07:42:23] t/018_wal_optimize.pl ................ ok 12403 ms ( 0.00
usr 0.00 sys + 1.40 cusr 0.63 csys = 2.03 CPU)
# poll_query_until timed out executing this query:
# SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep3'
# expecting this output:
# lost
# last actual query output:
# unreserved



# Failed test 'have walsender pid 3682154
# 3682136'
# at t/019_replslot_limit.pl line 335.
# '3682154
# 3682136'
# doesn't match '(?^:^[0-9]+$)'

The latter looks like there are two walsenders active, which confuses
the test. Not sure what's happening in the first case, but looks like
some kind of a race condition at a quick glance.

Has anyone looked into these yet?

- Heikki


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-15 21:40:16 Re: Time to drop plpython2?
Previous Message Tom Lane 2022-02-15 21:28:27 Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)