Re: decoupling table and index vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-04-22 14:27:34
Message-ID: CAD21AoDJcGFQquLnWRiHWDPgdZa_JBrqwLQCQOtXC2A-OJeG-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 22, 2021 at 8:51 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Apr 21, 2021 at 8:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > We are used to thinking about table vacuum and index vacuum as parts
> > of a single, indivisible operation. You vacuum the table -- among
> > other things by performing HOT pruning and remembering dead TIDs --
> > and then you vacuum the indexes -- removing the remembered TIDs from
> > the index -- and then you vacuum the table some more, setting those
> > dead TIDs unused -- and then you're done. And along the way you do
> > some other things too like considering truncation that aren't relevant
> > to the point I want to make here. Now, the problem with this is that
> > every index has its own needs, which are separate from the needs of
> > the tables, as I think Peter Geoghegan and Masahiko Sawada were also
> > discussing recently.
>
> I'm very happy to see that you've taken an interest in this work! I
> believe it's an important area. It's too important to be left to only
> two contributors. I welcome your participation as an equal partner in
> the broader project to fix problems with VACUUM.

+many

>
> > Now, the reason for this is that when we discover dead TIDs, we only
> > record them in memory, not on disk. So, as soon as VACUUM ends, we
> > lose all knowledge of whether those TIDs were and must rediscover
> > them. Suppose we didn't do this, and instead had a "dead TID" fork for
> > each table.
>
> I had a similar idea myself recently -- clearly remembering the TIDs
> that you haven't vacuumed to save work later on makes a lot of sense.
> I didn't get very far with it, even in my own head, but I like the
> direction you're taking it. Having it work a little like a queue makes
> a lot of sense.

Agreed. (I now remembered I gave a talk about a similar idea at PGCon
a couple years ago).

Another good point of this "dead TID fork" design is that IIUC we
don't necessarily need to make it crash-safe. We would not need WAL
logging for remembering dead TIDs. If the server crashes, we can
simply throw it away and assume we haven't done the first heap pass
yet.

>
> > Suppose further that this worked like a conveyor belt,
> > similar to WAL, where every dead TID we store into the fork is
> > assigned an identifying 64-bit number that is never reused. Then,
> > suppose that for each index, we store the number of the oldest entry
> > that might still need to be vacuumed from the index. Every time you
> > perform what we now call the first heap pass of a VACUUM, you add the
> > new TIDs you find to the dead TID fork.
>
> Maybe we can combine this known-dead-tid structure with the visibility
> map. Index-only scans might be able to reason about blocks that are
> mostly all-visible, but still have stub LP_DEAD line pointers that
> this other structure knows about -- so you can get index-only scans
> without requiring a full round of traditional vacuuming. Maybe there
> is some opportunity like that, but not sure how to fit it in to
> everything else right now.

Interesting idea.

>
> > Every time you vacuum an
> > index, the TIDs that need to be removed are those between the
> > oldest-entry pointer for that index and the current end of the TID
> > fork. You remove all of those and then advance your oldest-entry
> > pointer accordingly. If that's too many TIDs to fit in
> > maintenance_work_mem, you can just read as many as will fit and
> > advance your oldest-entry pointer less far. Every time you perform
> > what we now call the second heap pass of a VACUUM, you find all the
> > TIDs that precede every index's oldest-entry pointer and set them
> > unused. You then throw away the associated storage at the OS level.
> > This requires a scheme where relations can be efficiently truncated
> > from the beginning rather than only at the end, which is why I said "a
> > conveyor belt" and "similar to WAL". Details deliberately vague since
> > I am just brainstorming here.

The dead TID fork needs to also be efficiently searched. If the heap
scan runs twice, the collected dead TIDs on each heap pass could be
overlapped. But we would not be able to merge them if we did index
vacuuming on one of indexes at between those two heap scans. The
second time heap scan would need to record only TIDs that are not
collected by the first time heap scan.

>
> > Clearly,
> > we do not want to vacuum each partition by scanning the 1GB partition
> > + the 50MB local index + the 50GB global index. That's insane. With
> > the above system, since everything's decoupled, you can vacuum the
> > partition tables individually as often as required, and similarly for
> > their local indexes, but put off vacuuming the global index until
> > you've vacuumed a bunch, maybe all, of the partitions, so that the
> > work of cleaning up the global index cleans up dead TIDs from many or
> > all partitions instead of just one at a time.
>
> I can't think of any other way of sensibly implementing global indexes.

Given that we could load all dead TIDs from many or all partitions,
having dead TIDs on the memory with an efficient format would also
become important.

> > It's probably tricky to get the
> > autovacuum algorithm right here, but there seems to be some room for
> > optimism.
>
> I think that it's basically okay if global indexes suck when you do
> lots of UPDATEs -- this is a limitation that users can probably live
> with. What's not okay is if they suck when you do relatively few
> UPDATEs. It's especially not okay to have to scan the global index to
> delete one index tuple that points to one LP_DEAD item.

Right. Given decoupling index vacuuming, I think the index’s garbage
statistics are important which preferably need to be fetchable without
accessing indexes. It would be not hard to estimate how many index
tuples might be able to be deleted by looking at the dead TID fork but
it doesn’t necessarily match the actual number.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-04-22 14:41:06 Re: TRUNCATE on foreign table
Previous Message Justin Pryzby 2021-04-22 14:06:51 Re: TRUNCATE on foreign table