Re: Build-farm - intermittent error in 031_column_list.pl

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Build-farm - intermittent error in 031_column_list.pl
Date: 2024-01-16 18:00:01
Message-ID: 16d6d9cc-f97d-0b34-be65-425183ed3721@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

01.06.2022 05:06, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> FYI, I've noticed that after the last report by Peter-san
> we've gotten the same errors on Build Farm.
> We need to keep discussing to conclude this.
>
>
> 1. Details for system "xenodermus" failure at stage subscriptionCheck, snapshot taken 2022-05-31 13:00:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-05-31%2013%3A00%3A04
>
>
> 2. Details for system "phycodurus" failure at stage subscriptionCheck, snapshot taken 2022-05-26 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-05-26%2017%3A30%3A04

I think, I've discovered what causes that test failure.
When playing with bgwriter during researching [1], I made it more
aggressive with a dirty hack:
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1

         rc = WaitLatch(MyLatch,
                        WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-                       BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
+                       1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);

With this modification, I ran `make check -C src/test/subscription` in a
loop and observed the same failure of 031_column_list as discussed here.
With log_min_messages = DEBUG2, I see that in a failed case there is no
'"sub1" has now caught up' message (as Amit and Tomas guessed upthread)
(See full logs from one successful and two failed runs attached (I added
some extra logging and enabled wal_debug to better understand what's going
on here).)

If we look at the failures occurred in the buildfarm:
The first two from the past:
1)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-05-31%2013%3A00%3A04

2022-05-26 20:39:23.828 CEST [276284][postmaster][:0][] LOG: starting PostgreSQL 15beta1 on x86_64-pc-linux-gnu,
compiled by gcc-10 (Debian 10.3.0-15) 10.3.0, 64-bit
...
2022-05-26 20:39:39.768 CEST [277545][walsender][3/0:0][sub1] ERROR:  publication "pub6" does not exist

2)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-05-26%2017%3A30%3A04

2022-05-31 16:33:25.506 CEST [3223685][postmaster][:0][] LOG: starting PostgreSQL 15beta1 on x86_64-pc-linux-gnu,
compiled by clang version 6.0.1 , 64-bit
...
2022-05-31 16:33:41.114 CEST [3224511][walsender][3/0:0][sub1] ERROR:  publication "pub6" does not exist

The other two from the present:
3)
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2023-12-14%2009%3A14%3A52&stg=subscription-check

2023-12-14 09:26:12.523 UTC [1144979][postmaster][:0] LOG:  starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by
Debian clang version 13.0.1-11+b2, 64-bit
...
2023-12-14 09:26:28.663 UTC [1157936][walsender][3/0:0] ERROR: publication "pub6" does not exist

4)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2023-11-17%2018%3A28%3A24

2023-11-17 18:31:13.594 UTC [200939] LOG:  starting PostgreSQL 17devel on x86_64-linux, compiled by clang-14.0.6, 64-bit
...
2023-11-17 18:31:29.292 UTC [222103] sub1 ERROR:  publication "pub6" does not exist

we can see that all the failures occurred approximately since 16 seconds
after the server start. And it's very close to predefined
LOG_SNAPSHOT_INTERVAL_MS.

[1] https://www.postgresql.org/message-id/6f85667e-5754-5d35-dbf1-c83fe08c1e48%40gmail.com

Best regards,
Alexander

Attachment Content-Type Size
031_column_list-runs.7z application/x-7z-compressed 1.1 MB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-01-16 18:17:40 Re: UUID v7
Previous Message David G. Johnston 2024-01-16 17:52:42 New Window Function: ROW_NUMBER_DESC() OVER() ?