Re: BF animal dikkop reported a failure in 035_standby_logical_decoding

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tomas(dot)vondra(at)enterprisedb(dot)com
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: BF animal dikkop reported a failure in 035_standby_logical_decoding
Date: 2023-05-29 09:22:01
Message-ID: 602cfe08-dcca-39d7-5935-89c517622b4f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:
> Hi hackers,
>
> I saw a buildfarm failure on "dikkop"[1]. It failed in
> 035_standby_logical_decoding.pl, because the slots row_removal_inactiveslot and
> row_removal_activeslot are not invalidated after vacuum.

Thanks for sharing!

>
> regress_log_035_standby_logical_decoding:
> ```
> [12:15:05.943](4.442s) not ok 22 - inactiveslot slot invalidation is logged with vacuum on pg_class
> [12:15:05.945](0.003s)
> [12:15:05.946](0.000s) # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_class'
> # at t/035_standby_logical_decoding.pl line 238.
> [12:15:05.948](0.002s) not ok 23 - activeslot slot invalidation is logged with vacuum on pg_class
> [12:15:05.949](0.001s)
> [12:15:05.950](0.000s) # Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
> # at t/035_standby_logical_decoding.pl line 244.
> [13:38:26.977](5001.028s) # poll_query_until timed out executing this query:
> # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb'
> # expecting this output:
> # t
> # last actual query output:
> # f
> # with stderr:
> [13:38:26.980](0.003s) not ok 24 - confl_active_logicalslot updated
> [13:38:26.982](0.002s)
> [13:38:26.982](0.000s) # Failed test 'confl_active_logicalslot updated'
> # at t/035_standby_logical_decoding.pl line 251.
> Timed out waiting confl_active_logicalslot to be updated at t/035_standby_logical_decoding.pl line 251.
> ```
>
> 035_standby_logical_decoding.pl:
> ```
> # This should trigger the conflict
> $node_primary->safe_psql(
> 'testdb', qq[
> CREATE TABLE conflict_test(x integer, y text);
> DROP TABLE conflict_test;
> VACUUM pg_class;
> INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
> ]);
>
> $node_primary->wait_for_replay_catchup($node_standby);
>
> # Check invalidation in the logfile and in pg_stat_database_conflicts
> check_for_invalidation('row_removal_', $logstart, 'with vacuum on pg_class');
> ```
>
> Is it possible that the vacuum command didn't remove tuples and then the
> conflict was not triggered?

The flush_wal table added by Andres should guarantee that the WAL is flushed, so
the only reason I can think about is indeed that the vacuum did not remove tuples (
but I don't get why/how that could be the case).

> It seems we can't confirm this because there is not
> enough information.

Right, and looking at its status history most of the time the test is green (making it
even more difficult to diagnose).

> Maybe "vacuum verbose" can be used to provide more
> information.

I can see that dikkop "belongs" to Tomas (adding Tomas to this thread).
Tomas, do you think it would be possible to run some 035_standby_logical_decoding.pl
manually with "vacuum verbose" in the test mentioned above? (or any other way you can think
about that would help diagnose this random failure?).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yu Shi (Fujitsu) 2023-05-29 09:58:13 RE: BF animal dikkop reported a failure in 035_standby_logical_decoding
Previous Message Drouvot, Bertrand 2023-05-29 08:19:00 Re: Cleaning up nbtree after logical decoding on standby work