Re: Stabilizing the test_decoding checks, take N

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Stabilizing the test_decoding checks, take N
Date: 2022-04-17 14:32:36
Message-ID: 20220417143236.lp3hksv2h2z7qn6p@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Adding Amit, I think this is stuff he worked on...

On 2022-04-17 07:31:04 -0700, Andres Freund wrote:
> On 2022-04-16 13:11:59 -0400, Tom Lane wrote:
> > My pet dinosaur prairiedog just failed in the contrib/test_decoding
> > tests [1]:
> >
> > diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> > --- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out 2022-04-15 07:59:17.000000000 -0400
> > +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out 2022-04-15 09:06:36.000000000 -0400
> > @@ -77,10 +77,12 @@
> > streaming change for transaction
> > streaming change for transaction
> > streaming change for transaction
> > + closing a streamed block for transaction
> > + opening a streamed block for transaction
> > streaming change for transaction
> > closing a streamed block for transaction
> > committing streamed transaction
> > -(13 rows)
> > +(15 rows)
> >
> > Looking at the postmaster log, it's obvious where this extra transaction
> > came from: auto-analyze ran on pg_type concurrently with the test step
> > just before this one. That could only happen if the tests ran long enough
> > for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
> > (And I hasten to point out that some other animals, such as those running
> > valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)
> >
> > We've seen this sort of problem before [2], and attempted to fix it [3]
> > by making these tests ignore empty transactions. But of course
> > auto-analyze's transaction wasn't empty, so that didn't help.
>
> I don't quite understand this bit - the logic test_decoding uses to
> decide if a transaction is "empty" is just whether a tuple was
> output. And there shouldn't be any as part of auto-analyze, because we
> don't decode catalog changes. I suspect there's something broken in the
> streaming logic (potentially just in test_decoding) around
> skip_empty_xacts.
>
>
> > I think the most expedient way to prevent this type of failure is to run
> > the test_decoding tests with autovacuum_naptime cranked up so far as to
> > make it a non-issue, like maybe a day. Since test_decoding already adds
> > some custom settings to postgresql.conf, this'll take just a one-line
> > addition to test_decoding/logical.conf.
>
> I'm a bit worried about this approach - we've IIRC had past bugs that
> came only to light because of autovacuum starting. I wonder if we rather
> should do the opposite and reduce naptime so it'll be seen on fast
> machines, rather than very slow ones.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-17 14:36:15 Re: Intermittent buildfarm failures on wrasse
Previous Message Andres Freund 2022-04-17 14:31:04 Re: Stabilizing the test_decoding checks, take N