snapper and skink and fairywren (oh my!)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: snapper and skink and fairywren (oh my!)
Date: 2022-01-26 21:17:19
Message-ID: CA+TgmoazdctqA+=UU21h=FvoVasDzTSvv76isrCKL+xW=DTWzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was thinking about whether it made sense to try to commit anything
and decided it would be a good idea to check how the buildfarm looks
first. It doesn't look great.

1. snapper failed 4 out of the last 5 runs in recoveryCheck. The
latest run as of this writing shows this:

[19:09:50] t/026_overwrite_contrecord.pl ........ ok 43136 ms
# poll_query_until timed out executing this query:
# SELECT '0/1415E310' <= replay_lsn AND state = 'streaming' FROM
pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';
# expecting this output:
# t
# last actual query output:
#

2. skink failed the last run in this MiscCheck phase. I had no idea
what this phase was, because the name isn't very descriptive. It seems
that it runs some stuff in contrib and some stuff in src/test/modules,
which seems a bit confusing. Anyway, the failure here is:

test oldest_xmin ... FAILED 5533 ms

To find the failure, the regression test suite suggests looking at
file contrib/test_decoding/output_iso/regression.diffs or
regression.out. But neither file is in the buildfarm results so far as
I can see. It does have the postmaster log but I can't tell what's
gone wrong from looking at that. In fact I'm not really sure it's the
correct log file, because oldest_xmin.spec uses a slot called
"isolation_slot" and test_decoding/log/postmaster.log refers only to
"regression_slot", so it seems like this postmaster.log file might
cover only the pg_regress tests and not the results from the isolation
tester.

3. fairywren failed the last run in module-commit_tsCheck. It's unhappy because:

[16:30:02] t/002_standby.pl .... ok 13354 ms ( 0.06 usr 0.00 sys +
1.11 cusr 3.20 csys = 4.37 CPU)
# poll_query_until timed out executing this query:
# SELECT '0/303C7D0'::pg_lsn <= pg_last_wal_replay_lsn()
# expecting this output:
# t
# last actual query output:
# f

I don't know what is causing any of these failures, and I don't know
if there's already some discussion elsewhere that I've missed, but
maybe this email will be helpful to someone. I also noticed that 2 out
of the 3 failures report 2dbb7b9b2279d064f66ce9008869fd0e2b794534 "Fix
pg_hba_file_rules for authentication method cert" as the only new
commit since the last run, and it's hardly believable that that commit
would have broken this. Nor do I see any other recent changes that
look like likely culprits. Apologies in advance if any of this is my
fault.

--
Robert Haas
EDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-01-26 22:19:57 Re: autovacuum prioritization
Previous Message Robert Haas 2022-01-26 20:45:18 Re: refactoring basebackup.c