Stabilizing the test_decoding checks, take N

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Stabilizing the test_decoding checks, take N
Date: 2022-04-16 17:11:59
Message-ID: 1958043.1650129119@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My pet dinosaur prairiedog just failed in the contrib/test_decoding
tests [1]:

diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
--- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out 2022-04-15 07:59:17.000000000 -0400
+++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out 2022-04-15 09:06:36.000000000 -0400
@@ -77,10 +77,12 @@
streaming change for transaction
streaming change for transaction
streaming change for transaction
+ closing a streamed block for transaction
+ opening a streamed block for transaction
streaming change for transaction
closing a streamed block for transaction
committing streamed transaction
-(13 rows)
+(15 rows)

Looking at the postmaster log, it's obvious where this extra transaction
came from: auto-analyze ran on pg_type concurrently with the test step
just before this one. That could only happen if the tests ran long enough
for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
(And I hasten to point out that some other animals, such as those running
valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)

We've seen this sort of problem before [2], and attempted to fix it [3]
by making these tests ignore empty transactions. But of course
auto-analyze's transaction wasn't empty, so that didn't help.

I think the most expedient way to prevent this type of failure is to run
the test_decoding tests with autovacuum_naptime cranked up so far as to
make it a non-issue, like maybe a day. Since test_decoding already adds
some custom settings to postgresql.conf, this'll take just a one-line
addition to test_decoding/logical.conf.

I wonder whether we ought to then revert these tests' use of
skip-empty-xacts, or at least start having a mix of cases.
It seems to me that we'd rather know about it if there are unexpected
empty transactions. Is there anything we're using that for other than
to hide the effects of autovacuum?

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2022-04-15%2011%3A59%3A16

[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-12%2010%3A24%3A22

[3] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=b779d7d8fdae088d70da5ed9fcd8205035676df3

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-16 18:23:23 Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
Previous Message dl x 2022-04-16 17:06:41 GSoC: pgmoneta: Write-Ahead Log (WAL) infrastructure (2022)