Emit fewer vacuum records by reaping removable tuples during pruning

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2023-11-13 22:28:50
Message-ID: CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

See this example:

drop table if exists foo;
create table foo(a int) with (autovacuum_enabled=false);
insert into foo select i from generate_series(1,10000000)i;
update foo set a = 10;
\timing on
vacuum foo;

On my machine, the attached patch set provides a 10% speedup for vacuum
for this example -- and a 40% decrease in WAL bytes emitted.

Admittedly, this case is probably unusual in the real world. On-access
pruning would preclude it. Throw a SELECT * FROM foo before the vacuum
and the patch has no performance benefit.

However, it has no downside as far as I can tell. And, IMHO, it is a
code clarity improvement. This change means that lazy_vacuum_heap_page()
is only called when we are actually doing a second pass and reaping dead
items. I found it quite confusing that lazy_vacuum_heap_page() was
called by lazy_scan_heap() to set dead items unused in a block that we
just pruned.

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM. I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

I am planning to add a VM update into the freeze record, at which point
I will move the VM update code into lazy_scan_prune(). This will then
allow us to consolidate the freespace map update code for the prune and
noprune cases and make lazy_scan_heap() short and sweet.

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1].

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com

Attachment Content-Type Size
v1-0003-Set-would-be-dead-items-LP_UNUSED-while-pruning.patch application/x-patch 16.6 KB
v1-0001-Release-lock-on-heap-buffer-before-vacuuming-FSM.patch application/x-patch 2.1 KB
v1-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patch application/x-patch 7.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-11-13 22:42:31 Re: archive modules loose ends
Previous Message Melanie Plageman 2023-11-13 22:13:32 lazy_scan_heap() should release lock on buffer before vacuuming FSM