Re: Stabilizing the test_decoding checks, take N

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Stabilizing the test_decoding checks, take N
Date: 2022-04-18 05:49:14
Message-ID: CAA4eK1LCDapkR8ckRNCkvURGKsM8AvAHZEtRab_98teY5H_Z+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 16, 2022 at 10:42 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> My pet dinosaur prairiedog just failed in the contrib/test_decoding
> tests [1]:
>
> diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> --- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out 2022-04-15 07:59:17.000000000 -0400
> +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out 2022-04-15 09:06:36.000000000 -0400
> @@ -77,10 +77,12 @@
> streaming change for transaction
> streaming change for transaction
> streaming change for transaction
> + closing a streamed block for transaction
> + opening a streamed block for transaction
> streaming change for transaction
> closing a streamed block for transaction
> committing streamed transaction
> -(13 rows)
> +(15 rows)
>
> Looking at the postmaster log, it's obvious where this extra transaction
> came from: auto-analyze ran on pg_type concurrently with the test step
> just before this one. That could only happen if the tests ran long enough
> for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
> (And I hasten to point out that some other animals, such as those running
> valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)
>
> We've seen this sort of problem before [2], and attempted to fix it [3]
> by making these tests ignore empty transactions. But of course
> auto-analyze's transaction wasn't empty, so that didn't help.
>

The possible reason here is that this extra (auto-analyze) transaction
causes the logical decoding work mem to reach before the last change
of the test's transaction. As can be seen from the logs, it just
closed the stream before the last change and then opened a new stream
for the last change. Now, it is true that the auto-analyze changes
won't be decoded as they don't perform DML operation on any
non-catalog table but it could generate some invalidation message
which needs to be processed even though we won't send anything related
to it to the downstream.

This needs to be verified once by doing some manual testing as it may
not be easily reproducible every time. If this happens to be true then
I think your suggestion related to increasing autovacuum_naptime would
work.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2022-04-18 05:52:05 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Previous Message Michael Paquier 2022-04-18 04:55:13 Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman