Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, David Rowley <dgrowleyml(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: 2026-04-20 19:00:00
Message-ID: 71277259-264e-4983-a201-938b404049d7@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Melanie and Andres,

20.04.2026 19:18, Melanie Plageman wrote:
>> but I wonder if there are other queries,
>> which plans can change due to the same reason.
> I think we'll have to take this on a case-by-case basis when we see
> failures. While it is certainly possible other tests just rely on
> autovacuum not having run and set the page all-visible, many of them
> probably have already had to account for that.

Thank you for paying attention to this!

I think, I found another test which suffers from autoanalyze with the new
behavior: [1], [2].

Initially I reproduced this diff on a slow armv7 device after many
iterations of `make check` with:
autovacuum_naptime = 1
autovacuum_analyze_threshold = 1
debug_parallel_query = 'regress'

But now I see that it can be reproduced on an ordinary machine with just:
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -208,2 +208,3 @@ execute test_mode_pp(1); -- 2x
 execute test_mode_pp(1); -- 3x
+analyze test_mode;
 execute test_mode_pp(1); -- 4x
(and expected/plancache.out updated)

and `make check` running in a loop. It failed for me on iterations 5, 4,
10 (as far as I can see, analyze updates relallvisible not every time):
# parallel group (18 tests):  prepare xml conversion plancache limit returning copy2 polymorphism sequence rowtypes
largeobject temp rangefuncs with truncate domain plpgsql alter_table
# diff -U3 .../src/test/regress/expected/plancache.out .../src/test/regress/results/plancache.out
# --- .../src/test/regress/expected/plancache.out    2026-04-20 21:35:30.677775398 +0300
# +++ .../src/test/regress/results/plancache.out     2026-04-20 21:43:49.324492302 +0300
# @@ -374,11 +374,11 @@
#
#  -- we should now get a really bad plan
#  explain (costs off) execute test_mode_pp(2);
# -         QUERY PLAN
# ------------------------------
# +                        QUERY PLAN
# +----------------------------------------------------------
#   Aggregate
# -   ->  Seq Scan on test_mode
# -         Filter: (a = $1)
# +   ->  Index Only Scan using test_mode_a_idx on test_mode
# +         Index Cond: (a = $1)
#  (3 rows)
#
#  -- but we can force a custom plan

The same modified test survived 50 iterations at 378a21618~1.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2026-04-07%2012%3A45%3A07
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2026-04-12%2022%3A45%3A06

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2026-04-20 19:08:46 Re: MERGE PARTITIONS and DEPENDS ON EXTENSION.
Previous Message Alexander Korotkov 2026-04-20 18:45:51 Re: Implement waiting for wal lsn replay: reloaded