Test failures of 100_bugs.pl

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Test failures of 100_bugs.pl
Date: 2023-01-24 03:23:27
Message-ID: 20230124032327.fakewkdgrzmkrr2l@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

cfbot, the buildfarm and locally I have seen 100_bugs.pl fail
occasionally. Just rarely enough that I never got around to looking into it
for real.

Just now there was another failure on master:
https://cirrus-ci.com/task/5279589287591936

[01:00:49.441] ok 1 - index predicates do not cause crash
[01:00:49.441] ok 2 - update to temporary table without replica identity with FOR ALL TABLES publication
[01:00:49.441] ok 3 - update to unlogged table without replica identity with FOR ALL TABLES publication
[01:00:49.441] # test failed
[01:00:49.441] --- stderr ---
[01:00:49.441] # poll_query_until timed out executing this query:
[01:00:49.441] # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');
[01:00:49.441] # expecting this output:
[01:00:49.441] # t
[01:00:49.441] # last actual query output:
[01:00:49.441] # f
[01:00:49.441] # with stderr:
[01:00:49.441] # Tests were run but no plan was declared and done_testing() was not seen.
[01:00:49.441] # Looks like your test exited with 29 just after 3.
[01:00:49.441]
[01:00:49.441] (test program exited with status code 29)

the regress log:
https://api.cirrus-ci.com/v1/artifact/task/5279589287591936/testrun/build-32/testrun/subscription/100_bugs/log/regress_log_100_bugs
and twoway's log:
https://api.cirrus-ci.com/v1/artifact/task/5279589287591936/testrun/build-32/testrun/subscription/100_bugs/log/100_bugs_twoways.log

We see t2 added to the publication:
2023-01-24 00:57:30.099 UTC [73654][client backend] [100_bugs.pl][7/5:0] LOG: statement: ALTER PUBLICATION testpub ADD TABLE t2

And that *then* "t" was synchronized:
2023-01-24 00:57:30.102 UTC [73640][logical replication worker] LOG: logical replication table synchronization worker for subscription "testsub", table "t" has finished

and then that the refresh was issued:
2023-01-24 00:57:30.128 UTC [73657][client backend] [100_bugs.pl][5/10:0] LOG: statement: ALTER SUBSCRIPTION testsub REFRESH PUBLICATION

And we see a walsender starting and the query to get the new tables being executed:
2023-01-24 00:57:30.139 UTC [73660][walsender] [testsub][6/8:0] LOG: statement: SELECT DISTINCT t.schemaname, t.tablename
, t.attnames
FROM pg_catalog.pg_publication_tables t
WHERE t.pubname IN ('testpub')

And that's it, the rest of the time is just polling.

Perhaps wait_for_subscription_sync() should dump the set of rel states to make
something like this more debuggable?

The fact that the synchronization for t finished just before the refresh makes
me wonder if a wakeup or a cache invalidation got lost?

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zheng Li 2023-01-24 03:27:50 Re: Support logical replication of DDLs
Previous Message Peter Geoghegan 2023-01-24 03:22:18 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation