Re: Why is parula failing?

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, tharar(at)amazon(dot)com
Subject: Re: Why is parula failing?
Date: 2024-03-20 10:50:10
Message-ID: CAEze2WhLWXCf-Sr8-rA_BkeYX40Ch2F632QN8BtGqciwKQQU-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Mar 2024 at 20:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> For the last few days, buildfarm member parula has been intermittently
> failing the partition_prune regression test, due to unexpected plan
> changes [1][2][3][4]. The symptoms can be reproduced exactly by
> inserting a "vacuum" of one or another of the partitions of table
> "ab", so we can presume that the underlying cause is an autovacuum run
> against one of those tables. But why is that happening? None of
> those tables receive any insertions during the test, so I don't
> understand why autovacuum would trigger on them.
>
> I suppose we could attach "autovacuum=off" settings to these tables,
> but it doesn't seem to me that that should be necessary. These test
> cases are several years old and haven't given trouble before.
> Moreover, if that's necessary then there are a lot of other regression
> tests that would presumably need the same treatment.
>
> I'm also baffled why this wasn't happening before. I scraped the
> buildfarm logs for 3 months back and confirmed my impression that
> this is a new failure mode. But one of these four runs is on
> REL_14_STABLE, eliminating the theory that the cause is a recent
> HEAD-only change.
>
> Any ideas?

This may be purely coincidental, but yesterday I also did have a
seemingly random failure in the recovery test suite locally, in
t/027_stream_regress.pl, where it changed the join order of exactly
one of the queries (that uses the tenk table multiple times, iirc 3x
or so). As the work I was doing wasn't related to join order-related
problems, this surprised me. After checking for test file changes
(none), I re-ran the tests without recompilation and the failure went
away, so I attributed this to an untimely autoanalize. However, as
this was also an unexpected plan change in the tests this could be
related.

Sadly, I did not save the output of that test run, so this is just
about all the information I have. The commit I was testing on was
based on ca108be7, and system config is available if needed.

-Matthias

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-03-20 10:55:14 Re: SQL:2011 application time
Previous Message Shubham Khanna 2024-03-20 10:16:13 Re: speed up a logical replica setup