| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Alexander Lakhin <exclusion(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, aekorotkov(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Bug in amcheck? |
| Date: | 2026-01-16 12:41:41 |
| Message-ID: | 88c727b2-1c65-4ee9-8bed-48a4813818dd@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 16/01/2026 08:00, Alexander Lakhin wrote:
> 03.01.2026 04:40, Tom Lane wrote:
>> In the past couple of days, scorpion and skink have failed
>> the nbtree_half_dead_pages test with identical symptoms [1][2]:
>> ...
>> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?
>> nm=scorpion&dt=2026-01-02%2004%3A54%3A38
>> [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?
>> nm=skink&dt=2025-12-31%2003%3A34%3A51
>
> I reproduced such failures locally (when running multiple test
> instances under Valgrind concurrently) and discovered that the test might
> fail due to autovacuum activity. (Apparently because
> heap_prune_satisfies_vacuum() returns HEAPTUPLE_RECENTLY_DEAD, not
> HEAPTUPLE_DEAD for tuples in question, so prune_freeze_plan()/
> heap_page_prune_and_freeze() finds 0 lpdead_items.)
>
> pgsql.build/testrun/nbtree/regress/log/postmaster.log in [2] contains:
> 2025-12-31 06:00:41.778 CET autovacuum worker[2250984] LOG: automatic
> analyze of table "template1.information_schema.sql_features"
>
> (The postmaster log is missing in [1] for some reason...)
>
> I've also managed to reproduce this just with the attached patch and:
> echo "autovacuum_naptime = 1" > /tmp/temp.config
> TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/modules/nbtree
>
> ok 86 - nbtree_half_dead_pages 319 ms
> not ok 87 - nbtree_half_dead_pages 324 ms
> ok 88 - nbtree_half_dead_pages 326 ms
> ...
> # 1 of 101 tests failed.
Great, thanks! I was able to readily reproduce it by adding a delay to
auto-analyze (you still need to run it around 5 times in a row, for the
auto-analyze to kick):
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index aa4fbec143f..4f91ce84786 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -645,6 +645,8 @@ vacuum(List *relations, const VacuumParams params,
BufferAccessStrategy bstrateg
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
+ if (AmAutoVacuumWorkerProcess())
+ pg_usleep(1000000);
}
analyze_rel(vrel->oid, vrel->relation, params,
Pushed a fix using a little helper procedure to wait for snapshots
holding back the vacuum horizon to finish. It's the same approach as in
the syscache-update-pruned test.
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ashutosh Bapat | 2026-01-16 12:45:57 | Re: Refactor replication origin state reset helpers |
| Previous Message | Andrey Borodin | 2026-01-16 12:13:36 | Re: CREATE TABLE LIKE INCLUDING TRIGGERS |