Re: should vacuum's first heap pass be read-only?

From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-02-04 20:05:06
Message-ID: CAM-w4HPqAk-W3Uwep-dJ+MOk3je71NSZG5q0KaMtcCXTjCOjJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 3 Feb 2022 at 12:21, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> VACUUM's first pass over the heap is implemented by a function called
> lazy_scan_heap(), while the second pass is implemented by a function
> called lazy_vacuum_heap_rel(). This seems to imply that the first pass
> is primarily an examination of what is present, while the second pass
> does the real work. This used to be more true than it now is.

I've been out of touch for a while but I'm trying to catch up with the
progress of the past few years.

Whatever happened to the idea to "rotate" the work of vacuum. So all
the work of the second pass would actually be deferred until the first
pass of the next vacuum cycle.

That would also have the effect of eliminating the duplicate work,
both the writes with the wal generation as well as the actual scan.
The only heap scan would be "remove line pointers previously cleaned
from indexes and prune dead tuples recording them to clean from
indexes in future". The index scan would remove line pointers and
record them to be removed from the heap in a future heap scan.

The downside would mainly be in the latency before the actual tuples
get cleaned up from the table. That is not so much of an issue as far
as space these days with tuple pruning but is more and more of an
issue with xid wraparound. Also, having to record the line pointers
that have been cleaned from indexes somewhere on disk for the
subsequent vacuum would be extra state on disk and we've learned that
means extra complexity.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-02-04 20:13:49 Re: make MaxBackends available in _PG_init
Previous Message Stephen Frost 2022-02-04 20:03:50 Re: Support for NSS as a libpq TLS backend