From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
Date: | 2025-07-31 22:58:11 |
Message-ID: | CAAKRu_ZH8kL0Zm0j7m7DC9fzk7ru7yf9rm2pEQRvx1iXX25aPQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for continuing to take a look, Andrey.
On Mon, Jul 14, 2025 at 2:37 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
>
> If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.
That's very unfortunate. I wonder what could be causing this. Do you
suspect a bug in Postgres? Or something wrong with the disk, etc?
> Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
>
> To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
Ah, you mean the warnings currently in lazy_scan_prune(). To me this
suggestion makes sense. I see at least one other example with
ERRCODE_DATA_CORRUPTED that is an error level below ERROR.
I have attached a cleaned up and updated version of the patch set (it
doesn't yet include your suggested error message change).
What's new in this version
-----
In addition to general code, comment, and commit message improvements,
notable changes are as follows:
- I have used the GlobalVisState for determining if the whole page is
visible in a more natural way.
- I micro-benchmarked and identified some sources of regression in the
additional code SELECT queries would do to set the VM. So, there are
several new commits addressing these (for example inlining several
functions and unsetting all-visible when we see a dead tuple if we
won't attempt freezing).
- Because heap_page_prune_and_freeze() was getting long, I added some
helper functions.
Performance impact of setting the VM on-access
-------
I found that with the patch set applied, we set many pages all-visible
in the VM on access, resulting in a higher overall number of pages set
all-visible, reducing load for vacuum, and dramatically decreasing
heap fetches by index-only scans.
I devised a simple benchmark -- with 8 workers inserting 20 rows at a
time into a table with a few columns and updating a single row that
they just inserted. Another worker queries the table 1x second using
an index.
After running the benchmark for a few minutes, though the table was
autovacuumed several times in both cases, with the patchset applied,
15% more blocks were all-visible at the end of the benchmark.
And with my patch applied, index-only scans did far fewer heap
fetches. A SELECT count(*) of the table at the same point in the
benchmark did 10,000 heap fetches on master and 500 with the patch
applied (I used auto_explain to determine this).
With my patch applied, autovacuum workers write half as much WAL as on
master. Some of this is courtesy of other patches in the set which
eliminate separate WAL records for setting the page all-visible. But,
vacuum is also scanning fewer pages and dirtying fewer buffers because
they are being set all-visible on-access.
There are more details about the benchmark at the end of the email.
Setting pd_prune_xid on insert
------
The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.
Besides the failing test, I have a feeling that my current heuristic
for whether or not to set the VM on-access is not quite right for
pages that have only been inserted to -- and if we get it wrong, we've
wasted those CPU cycles because we didn't otherwise need to prune the
page.
- Melanie
Benchmark
-------
psql -c "
DROP TABLE IF EXISTS simple_table;
CREATE TABLE simple_table (
id SERIAL PRIMARY KEY,
group_id INT NOT NULL,
data TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
create index on simple_table(group_id);
"
pgbench \
--no-vacuum \
--random-seed=0 \
-c 8 \
-j 8 \
-M prepared \
-T 200 \
> "pgbench_run_summary_update_${version}" \
-f- <<EOF &
\set gid random(1,1000)
INSERT INTO simple_table (group_id, data)
SELECT :gid, 'inserted'
RETURNING id \gset
update simple_table set data = 'updated' where id = :id;
insert into simple_table (group_id, data)
select :gid, 'inserted'
from generate_series(1,20);
EOF
insert_pid=$!
pgbench \
--no-vacuum \
--random-seed=0 \
-c 1 \
-j 1 \
--rate=1 \
-M prepared \
-T 200 \
> "pgbench_run_summary_select_${version}" \
-f- <<EOF
\set gid random(1, 1000)
select max(created_at) from simple_table where group_id = :gid;
select count(*) from simple_table where group_id = :gid;
EOF
wait $insert_pid
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-07-31 23:18:43 | Re: Datum as struct |
Previous Message | Michael Paquier | 2025-07-31 22:56:02 | Re: Let plan_cache_mode to be a little less strict |