Re: should vacuum's first heap pass be read-only?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-02-04 20:18:17
Message-ID: CA+TgmobvjQvncsunHafiqV1ix_Ej4Uui52zcdfjSQaM48OvuyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 4, 2022 at 2:16 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Feb 3, 2022 at 12:20 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > But maybe we should reconsider. What benefit do we get out of dirtying
> > the page twice like this, writing WAL each time? What if we went back
> > to the idea of having the first heap pass be read-only?
>
> What about recovery conflicts? Index vacuuming WAL records don't
> require their own latestRemovedXid field, since they can rely on
> earlier XLOG_HEAP2_PRUNE records instead. Since the TIDs that index
> vacuuming removes always point to LP_DEAD items in the heap, it's safe
> to lean on that.

Oh, that's an interesting consideration.

> > In fact, I'm
> > thinking we might want to go even further and try to prevent even hint
> > bit changes to the page during the first pass, especially because now
> > we have checksums and wal_log_hints. If our vacuum cost settings are
> > to believed (and I am not sure that they are) dirtying a page is 10
> > times as expensive as reading one from the disk. So on a large table,
> > we're paying 44 vacuum cost units per heap page vacuumed twice, when
> > we could be paying only 24 such cost units. What a bargain!
>
> In practice HOT generally works well enough that the number of heap
> pages that prune significantly exceeds the subset that are also
> vacuumed during the second pass over the heap -- at least when heap
> fill factor has been tuned (which might be rare). The latter category
> of pages is not reported on by the enhanced autovacuum logging added
> to Postgres 14, so you might be able to get some sense of how this
> works by looking at that.

Is there an extra "not" in this sentence? Because otherwise it seems
like you're saying that I should look at the information that isn't
reported, which seems hard.

In any case, I think this might be a death knell for the whole idea.
It might be good to cut down the number of page writes by avoiding
writing them twice -- but not at the expense of having the second pass
have to visit a large number of pages it could otherwise skip. I
suppose we could write only those pages in the first pass that we
aren't going to need to write again later, but at that point I can't
really see that we're winning anything.

> > Could we have our cake and eat it too by updating the FSM with
> > the amount of free space that the page WOULD have if we pruned it, but
> > not actually do so?
>
> Did you ever notice that VACUUM records free space after *it* prunes,
> using its own horizon? With a long running VACUUM operation, where
> unremoved "recently dead" tuples are common, it's possible that the
> amount of free space that's effectively available (available to every
> other backend) is significantly higher. And so we already record
> "subjective amounts of free space" -- though not necessarily by
> design.

Yes, I wondered about that. It seems like maybe a running VACUUM
should periodically refresh its notion of what cutoff to use.

> I'm not sure that what you're proposing here is the best way to go
> about it, but let's assume for a moment that it is. Can't you just
> simulate the conveyor belt approach, without needing a relation fork?
> Just store the same information in memory, accessed using the same
> interface, with a spillover path?

(I'm not sure it's best either.)

I think my concern here is about not having too many different code
paths from heap vacuuming. I agree that if we're going to vacuum
without an on-disk conveyor belt we can use an in-memory substitute.
However, to Greg's point, if we're using the conveyor belt, it seems
like we want to merge the second pass of one VACUUM into the first
pass of the next one. That is, if we start up a heap vacuum already
having a list of TIDs that can be marked unused, we want to do that
during the same pass of the heap that we prune and search for
newly-discovered dead TIDs. But we can't do that in the case where the
conveyor belt is only simulated, because our in-memory data structure
can't contain leftovers from a previous vacuum the way the on-disk
conveyor belt can. So it seems like the whole algorithm has to be
different. I'd like to find a way to avoid that.

If this isn't entirely making sense, it may well be because I'm a
little fuzzy on all of it myself. But I hope it's clear enough that
you can figure out what it is that I'm worrying about. If not, I'll
keep trying to explain until we both reach a sufficiently non-fuzzy
state.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-04 20:23:20 Re: should vacuum's first heap pass be read-only?
Previous Message Daniel Gustafsson 2022-02-04 20:18:13 Re: Support for NSS as a libpq TLS backend