RE: BF animal dikkop reported a failure in 035_standby_logical_decoding

From: "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "tomas(dot)vondra(at)enterprisedb(dot)com" <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: RE: BF animal dikkop reported a failure in 035_standby_logical_decoding
Date: 2023-05-29 09:58:13
Message-ID: OSZPR01MB6310CFFD7D0DCD60A05DB1C3FD4A9@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, May 29, 2023 5:22 PM Drouvot, Bertrand <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:
> > Hi hackers,
> >
> > I saw a buildfarm failure on "dikkop"[1]. It failed in
> > 035_standby_logical_decoding.pl, because the slots row_removal_inactiveslot
> and
> > row_removal_activeslot are not invalidated after vacuum.
>
> Thanks for sharing!
>
> >
> > regress_log_035_standby_logical_decoding:
> > ```
> > [12:15:05.943](4.442s) not ok 22 - inactiveslot slot invalidation is logged with
> vacuum on pg_class
> > [12:15:05.945](0.003s)
> > [12:15:05.946](0.000s) # Failed test 'inactiveslot slot invalidation is logged
> with vacuum on pg_class'
> > # at t/035_standby_logical_decoding.pl line 238.
> > [12:15:05.948](0.002s) not ok 23 - activeslot slot invalidation is logged with
> vacuum on pg_class
> > [12:15:05.949](0.001s)
> > [12:15:05.950](0.000s) # Failed test 'activeslot slot invalidation is logged with
> vacuum on pg_class'
> > # at t/035_standby_logical_decoding.pl line 244.
> > [13:38:26.977](5001.028s) # poll_query_until timed out executing this query:
> > # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where
> datname = 'testdb'
> > # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > [13:38:26.980](0.003s) not ok 24 - confl_active_logicalslot updated
> > [13:38:26.982](0.002s)
> > [13:38:26.982](0.000s) # Failed test 'confl_active_logicalslot updated'
> > # at t/035_standby_logical_decoding.pl line 251.
> > Timed out waiting confl_active_logicalslot to be updated at
> t/035_standby_logical_decoding.pl line 251.
> > ```
> >
> > 035_standby_logical_decoding.pl:
> > ```
> > # This should trigger the conflict
> > $node_primary->safe_psql(
> > 'testdb', qq[
> > CREATE TABLE conflict_test(x integer, y text);
> > DROP TABLE conflict_test;
> > VACUUM pg_class;
> > INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
> > ]);
> >
> > $node_primary->wait_for_replay_catchup($node_standby);
> >
> > # Check invalidation in the logfile and in pg_stat_database_conflicts
> > check_for_invalidation('row_removal_', $logstart, 'with vacuum on pg_class');
> > ```
> >
> > Is it possible that the vacuum command didn't remove tuples and then the
> > conflict was not triggered?
>
> The flush_wal table added by Andres should guarantee that the WAL is flushed,
> so
> the only reason I can think about is indeed that the vacuum did not remove
> tuples (
> but I don't get why/how that could be the case).
>
> > It seems we can't confirm this because there is not
> > enough information.
>
> Right, and looking at its status history most of the time the test is green (making
> it
> even more difficult to diagnose).
>
> > Maybe "vacuum verbose" can be used to provide more
> > information.
>
> I can see that dikkop "belongs" to Tomas (adding Tomas to this thread).
> Tomas, do you think it would be possible to run some
> 035_standby_logical_decoding.pl
> manually with "vacuum verbose" in the test mentioned above? (or any other
> way you can think
> about that would help diagnose this random failure?).
>

Thanks for your reply.

I saw another failure on "drongo" [1], which looks like a similar problem.

Maybe a temporary patch can be committed to dump the result of "vacuum verbose".
And we can check this when the test fails.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-05-26%2018%3A05%3A57

Regards,
Shi Yu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-05-29 11:03:05 Re: BF animal dikkop reported a failure in 035_standby_logical_decoding
Previous Message Drouvot, Bertrand 2023-05-29 09:22:01 Re: BF animal dikkop reported a failure in 035_standby_logical_decoding