Re: Why is subscription/t/031_column_list.pl failing so much?

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Why is subscription/t/031_column_list.pl failing so much?
Date: 2024-02-03 07:00:00
Message-ID: a1010953-7d65-225a-0afa-f11de5d5e011@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Tom and Noah,

03.02.2024 04:24, Tom Lane wrote:
> I'm still wondering how come the failure seems to have suddenly gotten
> way more common. The only changes that are in vaguely-related places
> and fit the time frame are Amit's 732924043 and 776621a5e, but I sure
> don't see a connection.

I think the failure rate increased due to tamandua, calliphoridae,
flaviventris, and kestrel were switched from make to meson recently.
The last `make` builds for them:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-01-31%2016%3A51%3A31
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-01-31%2016%3A51%3A38
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-01-31%2016%3A52%3A37
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-01-31%2016%3A51%3A53

and since that switch the 031_column_list duration increased significantly,
e.g., on the last tamandua `make` run it executed for 7 seconds, but
successful `meson` runs give much longer duration:
280/283 postgresql:subscription / subscription/031_column_list                    OK 38.27s   36 subtests passed
280/283 postgresql:subscription / subscription/031_column_list                    OK 126.13s   36 subtests passed
280/283 postgresql:subscription / subscription/031_column_list                    OK 31.93s   36 subtests passed
279/283 postgresql:subscription / subscription/031_column_list                    OK 99.76s   36 subtests passed

So, looking at the tamandua's failure log, I see:
2024-02-02 19:41:19.750 UTC [1579219][postmaster][:0] LOG:  starting PostgreSQL 17devel on x86_64-linux, compiled by
gcc-12.3.0, 64-bit
...
2024-02-02 19:42:19.973 UTC [1629333][client backend][4/213:0] LOG: statement: ALTER SUBSCRIPTION sub1 SET PUBLICATION pub7
2024-02-02 19:42:20.131 UTC [1625765][logical replication apply worker][3/122:0] LOG:  logical replication worker for
subscription "sub1" will restart because of a parameter change
2024-02-02 19:42:20.137 UTC [1629333][client backend][:0] LOG: disconnection: session time: 0:00:00.212 user=bf
database=postgres host=[local]
2024-02-02 19:42:20.191 UTC [1629535][logical replication apply worker][3/124:0] LOG:  logical replication apply worker
for subscription "sub1" has started
...
2024-02-02 19:42:20.445 UTC [1629535][logical replication apply worker][3/0:0] ERROR:  could not receive data from WAL
stream: ERROR:  publication "pub7" does not exist
    CONTEXT:  slot "sub1", output plugin "pgoutput", in the change callback, associated LSN 0/15C7698

(The interval between subscriber start and the error is ~ 4 * 15 seconds.)

Thus it still may be explained by bgwriter activity, though perhaps
autovacuum/checkpointer can add something as well.

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2024-02-03 07:54:16 Re: pgbench - adding pl/pgsql versions of tests
Previous Message Amit Kapila 2024-02-03 06:47:08 Re: Documentation: warn about two_phase when altering a subscription