Re: Stabilizing the test_decoding checks, take N

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Stabilizing the test_decoding checks, take N
Date: 2022-04-18 09:59:37
Message-ID: CAFiTN-vFrtusp4hYC+e06EAhK3OxZrr48d9VxXMmJkQi2ZFeNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 18, 2022 at 11:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Sat, Apr 16, 2022 at 10:42 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > My pet dinosaur prairiedog just failed in the contrib/test_decoding
> > tests [1]:
> >
> > diff -U3
> /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out
> /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> > ---
> /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out
> 2022-04-15 07:59:17.000000000 -0400
> > +++
> /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> 2022-04-15 09:06:36.000000000 -0400
> > @@ -77,10 +77,12 @@
> > streaming change for transaction
> > streaming change for transaction
> > streaming change for transaction
> > + closing a streamed block for transaction
> > + opening a streamed block for transaction
> > streaming change for transaction
> > closing a streamed block for transaction
> > committing streamed transaction
> > -(13 rows)
> > +(15 rows)
> >
> > Looking at the postmaster log, it's obvious where this extra transaction
> > came from: auto-analyze ran on pg_type concurrently with the test step
> > just before this one. That could only happen if the tests ran long
> enough
> > for autovacuum_naptime to elapse, but prairiedog is a pretty slow
> machine.
> > (And I hasten to point out that some other animals, such as those running
> > valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)
> >
> > We've seen this sort of problem before [2], and attempted to fix it [3]
> > by making these tests ignore empty transactions. But of course
> > auto-analyze's transaction wasn't empty, so that didn't help.
> >
>
> The possible reason here is that this extra (auto-analyze) transaction
> causes the logical decoding work mem to reach before the last change
> of the test's transaction. As can be seen from the logs, it just
> closed the stream before the last change and then opened a new stream
> for the last change. Now, it is true that the auto-analyze changes
> won't be decoded as they don't perform DML operation on any
> non-catalog table but it could generate some invalidation message
> which needs to be processed even though we won't send anything related
> to it to the downstream.
>

This analysis seems right to me.

> This needs to be verified once by doing some manual testing as it may
> not be easily reproducible every time. If this happens to be true then
> I think your suggestion related to increasing autovacuum_naptime would
> work.
>

I will try to reproduce this, maybe by reducing the autovacuum_naptime or
parallelly running some script that continuously performs DDL-only
transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-04-18 10:45:07 Re: Crash in new pgstats code
Previous Message vignesh C 2022-04-18 09:40:46 Re: Skipping schema changes in publication