Re: should vacuum's first heap pass be read-only?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-03-31 20:25:26
Message-ID: CA+TgmoZzgcKfMZrFbyNysZwcMD3jzWtzjFuJqJUAU-qc7Z1bVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ returning to this thread after a bit of a hiatus ]

On Fri, Feb 25, 2022 at 12:15 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I can only speak for myself, but that sounds correct to me. IMO what
> we really want here is to create lots of options for VACUUM.

I agree. That seems like a good way of thinking about it.

> I don't think we want to, exactly. Maybe it's easier to store
> redundant TIDs than to avoid storing them in the first place (we can
> probably just accept some redundancy). There is a trade-off to be made
> there. I'm not at all sure of what the best trade-off is, though.

My intuition is that storing redundant TIDs will turn out to be a very
bad idea. I think that if we do that, the system will become prone to
a new kind of vicious cycle (to try to put this in words similar to
the ones you've been using to write about similar effects). Imagine
that we vacuum a table which contains N dead tuples a total of K times
in succession, but each time we either deliberately decide against
index vacuuming or get killed before we can complete it. If we don't
have logic to prevent duplicate entries from being added to the
conveyor belt, we will have N*K TIDs in the conveyor belt rather than
only N, and I think it's not hard to imagine situations where K could
be big enough for that to be hugely painful. Moreover, at some point,
we're going to need to deduplicate anyway, because each of those dead
TIDs can only be marked unused once. Letting the data on the conveyor
belt in the hopes of sorting it out later seems almost certain to be a
losing proposition. The more data we collect on the conveyor belt, the
harder it's going to be when we eventually need to deduplicate.

Also, I think it's possible to deduplicate at a very reasonable cost
so long as (1) we enter each batch of TIDs into the conveyor belt as a
distinguishable "run" and (2) we never accumulate so many of these
runs at the same time that we can't do a single merge pass to turn
them into a sorted output stream. We're always going to discover dead
TIDs in sorted order, so as we make our pass over the heap, we can
simultaneously be doing an on-the-fly merge pass over the existing
runs that are on the conveyor belt. That means we never need to store
all the dead TIDs in memory at once. We just fetch enough data from
each "run" to cover the block numbers for which we're performing the
heap scan, and when the heap scan advances we throw away data for
blocks that we've passed and fetch data for the blocks we're now
reaching.

> > but if you are going to rescan the heap
> > again next time before doing any index vacuuming then why we want to
> > store them anyway.
>
> It all depends, of course. The decision needs to be made using a cost
> model. I suspect it will be necessary to try it out, and see.

But having said that, coming back to this with fresh eyes, I think
Dilip has a really good point here. If the first thing we do at the
start of every VACUUM is scan the heap in a way that is guaranteed to
rediscover all of the dead TIDs that we've previously added to the
conveyor belt plus maybe also new ones, we may as well just forget the
whole idea of having a conveyor belt at all. At that point we're just
talking about a system for deciding when to skip index vacuuming, and
the conveyor belt is a big complicated piece of machinery that stores
data we don't really need for anything because if we threw it out the
next vacuum would reconstruct it anyhow and we'd get exactly the same
results with less code. The only way the conveyor belt system has any
value is if we think that there is some set of circumstances where the
heap scan is separated in time from the index vacuum, such that we
might sometimes do an index vacuum without having done a heap scan
just before.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-03-31 20:25:58 Re: SQL/JSON: functions
Previous Message James Coleman 2022-03-31 20:19:08 Re: Correct docs re: rewriting indexes when table rewrite is skipped