Re: Build-farm - intermittent error in 031_column_list.pl

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: smithpb2250(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Build-farm - intermittent error in 031_column_list.pl
Date: 2022-05-19 06:58:04
Message-ID: 20220519.155804.753270708308766360.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 19 May 2022 14:26:56 +1000, Peter Smith <smithpb2250(at)gmail(dot)com> wrote in
> Hi hackers.
>
> FYI, I saw that there was a recent Build-farm error on the "grison" machine [1]
> [1] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grison&br=HEAD
>
> The error happened during "subscriptionCheck" phase in the TAP test
> t/031_column_list.pl
> This test file was added by this [2] commit.
> [2] https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5

What is happening for all of them looks like that the name of a
publication created by CREATE PUBLICATION without a failure report is
missing for a walsender came later. It seems like CREATE PUBLICATION
can silently fail to create a publication, or walsender somehow failed
to find existing one.

> ~~
>
> I checked the history of fails for that TAP test t/031_column_list.pl
> and found that this same error seems to have been happening
> intermittently for at least the last 50 days.
>
> Details of similar previous errors from the BF are listed below.
>
> ~~~
>
> 1. Details for system "grison" failure at stage subscriptionCheck,
> snapshot taken 2022-05-18 18:11:45
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2022-05-18%2018%3A11%3A45
>
> [22:02:08] t/029_on_error.pl .................. ok 25475 ms ( 0.01
> usr 0.00 sys + 15.39 cusr 5.59 csys = 20.99 CPU)
> # poll_query_until timed out executing this query:
> # SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
> # FROM pg_catalog.pg_stat_replication
> # WHERE application_name IN ('sub1', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # Tests were run but no plan was declared and done_testing() was not seen.
> # Looks like your test exited with 29 just after 22.
> [22:09:25] t/031_column_list.pl ...............
> ...
> [22:02:47.887](1.829s) ok 22 - partitions with different replica
> identities not replicated correctly Waiting for replication conn
> sub1's replay_lsn to pass 0/1530588 on publisher
> [22:09:25.395](397.508s) # poll_query_until timed out executing this query:
> # SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
> # FROM pg_catalog.pg_stat_replication
> # WHERE application_name IN ('sub1', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> timed out waiting for catchup at t/031_column_list.pl line 728.
> ### Stopping node "publisher" using mode immediate

2022-04-17 00:16:04.278 CEST [293659][client backend][4/270:0][031_column_list.pl] LOG: statement: CREATE PUBLICATION pub9 FOR TABLE test_part_d (a) WITH (publish_via_partition_root = true);
2022-04-17 00:16:04.279 CEST [293659][client backend][:0][031_column_list.pl] LOG: disconnection: session time: 0:00:00.002 user=bf database=postgres host=[local]

"CREATE PUBLICATION pub9" is executed at 00:16:04.278 on 293659 then
the session has been disconnected. But the following request for the
same publication fails due to the absense of the publication.

2022-04-17 00:16:08.147 CEST [293856][walsender][3/0:0][sub1] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/153DB88 (proto_version '3', publication_names '"pub9"')
2022-04-17 00:16:08.148 CEST [293856][walsender][3/0:0][sub1] ERROR: publication "pub9" does not exist

> ~~~
>
> 2. Details for system "xenodermus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-16 21:00:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-04-16%2021%3A00%3A04

The same. pub9 is missing after creation.

> ~~~
>
> 3. Details for system "phycodurus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-05 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-04-05%2017%3A30%3A04

The same happens for pub7..

> 4. Details for system "phycodurus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-05 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-04-05%2017%3A30%3A04

Same. pub7 is missing.

> 5. Details for system "grison" failure at stage subscriptionCheck,
> snapshot taken 2022-04-03 18:11:39
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2022-04-03%2018%3A11%3A39

Same. pub7 is missing.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-05-19 07:03:57 Re: Addition of PostgreSQL::Test::Cluster::pg_version()
Previous Message Peter Smith 2022-05-19 06:22:08 Re: Perform streaming logical transactions by background workers and parallel apply