From: | Renan Alves Fonseca <renanfonseca(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | PoC: Compute a histogram of prune_xid to support autovacuum improvements |
Date: | 2025-06-03 16:39:44 |
Message-ID: | 87ecw0srsf.fsf@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
in the scope of improving the autovacuum algorithm, this patch proposes
to maintain a histogram of *smallest prunable xid per page* for each
relation. It allows to estimate the number of pages that would
be pruned by vacuum for a given cutoff.
The *smallest prunable xid per page* is prune_xid in each page
header. The value of prune_xid is not always consistent with the
contents of the page, but this patch does not try to improve on this. We
suppose that the current accuracy of prune_xid is good enough.
The histogram lives in PgStat_StatTabEntry, and so it makes use of
pgstat machinery. In particular, there is a per-backend transient
histogram that is merged into the main shared histogram using
pgstat_report_vacuum() or pgstat_relation_flush_cb().
This histogram uses a fixed size data structure but its bounds are
dynamic. Over time, some bins are merged to give space for a fresh new
bin that covers the ever increasing xids. The maintenance of the
histogram bounds is a relatively expensive operation, whereas a simple
update of the bins count is very efficient. So, we may arrange to do the
expensive operation out of the hot paths.
In order to collect data, we keep track of prune_xid in (1) access heap
prune, (2) vacuum and (3) heap_{delete,update,insert}. Adding stuff in
(3) might raise eyebrows since it is a cost per tuple. However, we only
do something if the page prune_xid changes and, finally, it is virtually
a cost per page.
You can give a try using pgbench like this.
``` shell
pgbench -i
pgbench -T 30
```
``` psql
\set tt pgbench_accounts
\i src/test/regress/sql/prune_xid_aux_check.sql
```
The functions pg_stat_get_prune_xid_{freqs,bounds} return the prune_xid
histogram for a given relation. In the second part of the script above,
we use *pageinspect* to check the correctness of the computed
histogram. In my tests, not always, a small annoying difference shows
up. Actually, it is really annoying, I'm struggling with it and I hope
someone helps me to find the missing bits.
Regarding performance, I've not observed a sensible difference using
pgbench but I certainly don't have a good setup for benchmarking. I
could observe, using *perf*, that the function
pgstat_update_relation_prune_xid_histogram(), which collects data in
almost all cases, has a overall time much lower than
pgstat_count_heap_update() for example. I've looked at perf data using
pgbench and using huge batch updates.
The initial version of this work proposed a histogram of dead tuples
xmax for each relation. After some suggestions in the discord hackers
channel, I've understood that a page wise info can be more useful for
the autovacuum planning. There is more detailed information in the file
patch-notes.{org,md} and, of course, in the code itself.
The attached patch is based on REL_18_BETA1. Sorry for not sending a
complete, rounded patch. But I feel that I really need some feedback at
this point. Above all, I'd like to know if someone is interested in
using this information to improve the autovacuum algorithm. Otherwise,
we cannot justify this patch.
Looking forward to any kind of feedback.
Best Regards,
Renan Fonseca
Attachment | Content-Type | Size |
---|---|---|
v1-0001-prune_xid-hist-patch-description.patch | text/x-patch | 10.1 KB |
v1-0002-prune_xid-hist-create-data-structures.patch | text/x-patch | 2.8 KB |
v1-0003-prune_xid-hist-pgstat-prune_xid-histogram-interfa.patch | text/x-patch | 1.4 KB |
v1-0004-prune_xid-hist-add-catalog-functions.patch | text/x-patch | 3.4 KB |
v1-0005-prune_xid-hist-core-functions.patch | text/x-patch | 7.3 KB |
v1-0006-prune_xid-hist-initialize-shared-histogram-on-TRU.patch | text/x-patch | 907 bytes |
v1-0007-prune_xid-hist-initialize-local-histogram.patch | text/x-patch | 823 bytes |
v1-0008-prune_xid-hist-collect-data-from-heap_-delete-upd.patch | text/x-patch | 4.2 KB |
v1-0009-prune_xid-hist-collect-data-from-opportunistic-pr.patch | text/x-patch | 1.0 KB |
v1-0010-prune_xid-hist-collect-data-from-vacuum.patch | text/x-patch | 1.7 KB |
v1-0011-prune_xid-hist-transfer-data-to-shared-histogram.patch | text/x-patch | 2.7 KB |
v1-0012-prune_xid-hist-tests.patch | text/x-patch | 3.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-06-03 16:47:45 | Re: Replication slot is not able to sync up |
Previous Message | Robert Haas | 2025-06-03 15:27:56 | Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them |