Re: Intermittent buildfarm failures on wrasse

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Intermittent buildfarm failures on wrasse
Date: 2022-04-13 23:38:06
Message-ID: 1364997.1649893086@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible.

> Sometimes OldestXmin can go backwards in VACUUM operations that are
> run in close succession against the same table, due to activity from
> other databases in the same cluster (perhaps other factors are
> involved at times).

I've been doing some testing locally by inserting commands to
manually set tenk1's relallvisible to zero. I first did that
in test_setup.sql ... and it had no effect whatsoever. Further
experimentation showed that the "CREATE INDEX ON tenk1" steps
in create_index.sql itself generally suffice to fix relallvisible;
although if you force it back to zero after the last such command,
you get the same plan diffs wrasse is showing. And you don't
get any others, which I thought curious until I realized that
sanity_check.sql's database-wide VACUUM offers yet another
opportunity to heal the incorrect value. If you force it back
to zero again after that, a bunch of later tests start to show
plan differences, which is what I'd been expecting.

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs. Which is odd in itself because not that much time elapses
between create_index and sanity_check, certainly less than the
time from test_setup to create_index.

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.

And we still lack an explanation of why this only now broke.
I remain suspicious that pgstats is behaving unexpectedly.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-04-13 23:45:44 Re: Intermittent buildfarm failures on wrasse
Previous Message David G. Johnston 2022-04-13 23:34:32 Re: shared-memory based stats collector - v70