Re: Bug in amcheck?

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, aekorotkov(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bug in amcheck?
Date: 2026-01-16 12:41:41
Message-ID: 88c727b2-1c65-4ee9-8bed-48a4813818dd@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/01/2026 08:00, Alexander Lakhin wrote:
> 03.01.2026 04:40, Tom Lane wrote:
>> In the past couple of days, scorpion and skink have failed
>> the nbtree_half_dead_pages test with identical symptoms [1][2]:
>> ...
>> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?
>> nm=scorpion&dt=2026-01-02%2004%3A54%3A38
>> [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?
>> nm=skink&dt=2025-12-31%2003%3A34%3A51
>
> I reproduced such failures locally (when running multiple test
> instances under Valgrind concurrently) and discovered that the test might
> fail due to autovacuum activity. (Apparently because
> heap_prune_satisfies_vacuum() returns HEAPTUPLE_RECENTLY_DEAD, not
> HEAPTUPLE_DEAD for tuples in question, so prune_freeze_plan()/
> heap_page_prune_and_freeze() finds 0 lpdead_items.)
>
> pgsql.build/testrun/nbtree/regress/log/postmaster.log in [2] contains:
> 2025-12-31 06:00:41.778 CET autovacuum worker[2250984] LOG: automatic
> analyze of table "template1.information_schema.sql_features"
>
> (The postmaster log is missing in [1] for some reason...)
>
> I've also managed to reproduce this just with the attached patch and:
> echo "autovacuum_naptime = 1" > /tmp/temp.config
> TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/modules/nbtree
>
> ok 86 - nbtree_half_dead_pages 319 ms
> not ok 87 - nbtree_half_dead_pages 324 ms
> ok 88 - nbtree_half_dead_pages 326 ms
> ...
> # 1 of 101 tests failed.

Great, thanks! I was able to readily reproduce it by adding a delay to
auto-analyze (you still need to run it around 5 times in a row, for the
auto-analyze to kick):

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index aa4fbec143f..4f91ce84786 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -645,6 +645,8 @@ vacuum(List *relations, const VacuumParams params,
BufferAccessStrategy bstrateg
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
+ if (AmAutoVacuumWorkerProcess())
+ pg_usleep(1000000);
}

analyze_rel(vrel->oid, vrel->relation, params,

Pushed a fix using a little helper procedure to wait for snapshots
holding back the vacuum horizon to finish. It's the same approach as in
the syscache-update-pruned test.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-01-16 12:45:57 Re: Refactor replication origin state reset helpers
Previous Message Andrey Borodin 2026-01-16 12:13:36 Re: CREATE TABLE LIKE INCLUDING TRIGGERS