Re: should vacuum's first heap pass be read-only?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-02-04 20:23:20
Message-ID: CA+TgmoZc=1TRoL1m2v6Uz25TzatNitAfXH6fK06SnhKz7_5wuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 4, 2022 at 3:05 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> Whatever happened to the idea to "rotate" the work of vacuum. So all
> the work of the second pass would actually be deferred until the first
> pass of the next vacuum cycle.
>
> That would also have the effect of eliminating the duplicate work,
> both the writes with the wal generation as well as the actual scan.
> The only heap scan would be "remove line pointers previously cleaned
> from indexes and prune dead tuples recording them to clean from
> indexes in future". The index scan would remove line pointers and
> record them to be removed from the heap in a future heap scan.

I vaguely remember previous discussions of this, but only vaguely, so
if there are threads on list feel free to send pointers. It seems to
me that in order to do this, we'd need some kind of way of storing the
TIDs that were found to be dead in one VACUUM so that they can be
marked unused in the next VACUUM - and the conveyor belt patches on
which Dilip's work is based provide exactly that machinery, which his
patches then leverage to do exactly that thing. But it feels like a
big, sudden change from the way things work now, and I'm trying to
think of ways to make it more incremental, and thus hopefully less
risky.

> The downside would mainly be in the latency before the actual tuples
> get cleaned up from the table. That is not so much of an issue as far
> as space these days with tuple pruning but is more and more of an
> issue with xid wraparound. Also, having to record the line pointers
> that have been cleaned from indexes somewhere on disk for the
> subsequent vacuum would be extra state on disk and we've learned that
> means extra complexity.

I don't think there's any XID wraparound issue here. Phase 1 does a
HOT prune, after which only dead line pointers remain, not dead
tuples. And those contain no XIDs. Phase 2 is only setting those dead
line pointers back to unused.

As for the other part, that's pretty much exactly the complexity that
I'm worrying about.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-04 20:27:43 Re: make MaxBackends available in _PG_init
Previous Message Robert Haas 2022-02-04 20:18:17 Re: should vacuum's first heap pass be read-only?