Re: Intermittent buildfarm failures on wrasse

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Intermittent buildfarm failures on wrasse
Date: 2022-04-14 16:18:50
Message-ID: 20220414161850.ibqkevxcv6a7vdxg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-14 12:01:23 -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
> >> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> >> wrasse?
>
> > Done. Also forced hourly builds.

Thanks! Can you repro the problem manually on wrasse, perhaps even
outside the buildfarm script? That might be simpler than debugging via
the BF...

> Thanks! We now have two failing runs with the additional info [1][2],
> and in both, it's clear that the first autovac worker doesn't launch
> until 1 minute after postmaster start, by which time we're long done
> with the test scripts of interest. So whatever is breaking this is
> not an autovac worker.

I did some experiments around that too, and didn't find any related
problems.

For a second I was wondering if it's caused by the time of initdb (which
ends up with a working pgstat snapshot now, but didn't before), but
that's just a few more seconds. While the BF scripts don't show
timestamps for initdb, the previous step's log output confirms that it's
just a few seconds...

> I think I'm going to temporarily add a couple of queries to check
> what tenk1's relallvisible actually is, just so we can confirm
> positively that that's what's causing the plan change. (I'm also
> curious about whether the CREATE INDEX steps manage to change it
> at all.)

I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
unconditionally. This is like the third bug where we needed that
information, and it's practically impossible to include in regression
output. Then we'd know what the xid horizon is, whether pages were
skipped, etc.

It also just generally seems like a good thing.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-04-14 16:21:54 Re: Intermittent buildfarm failures on wrasse
Previous Message Robert Haas 2022-04-14 16:16:45 Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)