Re: should vacuum's first heap pass be read-only?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-02-04 19:16:13
Message-ID: CAH2-Wzmm6W5cVrOugNpyd4rF4sbi2+iVG-8Sj94w3uyhGxwyWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 3, 2022 at 12:20 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But maybe we should reconsider. What benefit do we get out of dirtying
> the page twice like this, writing WAL each time? What if we went back
> to the idea of having the first heap pass be read-only?

What about recovery conflicts? Index vacuuming WAL records don't
require their own latestRemovedXid field, since they can rely on
earlier XLOG_HEAP2_PRUNE records instead. Since the TIDs that index
vacuuming removes always point to LP_DEAD items in the heap, it's safe
to lean on that.

> In fact, I'm
> thinking we might want to go even further and try to prevent even hint
> bit changes to the page during the first pass, especially because now
> we have checksums and wal_log_hints. If our vacuum cost settings are
> to believed (and I am not sure that they are) dirtying a page is 10
> times as expensive as reading one from the disk. So on a large table,
> we're paying 44 vacuum cost units per heap page vacuumed twice, when
> we could be paying only 24 such cost units. What a bargain!

In practice HOT generally works well enough that the number of heap
pages that prune significantly exceeds the subset that are also
vacuumed during the second pass over the heap -- at least when heap
fill factor has been tuned (which might be rare). The latter category
of pages is not reported on by the enhanced autovacuum logging added
to Postgres 14, so you might be able to get some sense of how this
works by looking at that.

> Could we have our cake and eat it too by updating the FSM with
> the amount of free space that the page WOULD have if we pruned it, but
> not actually do so?

Did you ever notice that VACUUM records free space after *it* prunes,
using its own horizon? With a long running VACUUM operation, where
unremoved "recently dead" tuples are common, it's possible that the
amount of free space that's effectively available (available to every
other backend) is significantly higher. And so we already record
"subjective amounts of free space" -- though not necessarily by
design.

> I'm thinking about this because of the "decoupling table and index
> vacuuming" thread, which I was discussing with Dilip this morning. In
> a world where table vacuuming and index vacuuming are decoupled, it
> feels like we want to have only one kind of heap vacuum. It pushes us
> in the direction of unifying the first and second pass, and doing all
> the cleanup work at once. However, I don't know that we want to use
> the approach described there in all cases. For a small table that is,
> let's just say, not part of any partitioning hierarchy, I'm not sure
> that using the conveyor belt approach makes a lot of sense, because
> the total amount of work we want to do is so small that we should just
> get it over with and not clutter up the disk with more conveyor belt
> forks -- especially for people who have large numbers of small tables,
> the inode consumption could be a real issue.

I'm not sure that what you're proposing here is the best way to go
about it, but let's assume for a moment that it is. Can't you just
simulate the conveyor belt approach, without needing a relation fork?
Just store the same information in memory, accessed using the same
interface, with a spillover path?

Ideally VACUUM will be able to use the conveyor belt for any table.
Whether or not it actually happens should be decided at the latest
possible point during VACUUM, based on considerations about the actual
number of dead items that we now need to remove from indexes, as well
as metadata from any preexisting conveyor belt.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-02-04 19:16:53 configure sets GCC=yes for clang
Previous Message John Naylor 2022-02-04 19:00:00 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations