Re: decoupling table and index vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-09-15 22:08:41
Message-ID: CAH2-Wz=9R83wcwZcPUH4FVPeDM4znzbzMvp3rt21+XhQWMU8+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 21, 2021 at 8:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Now, the reason for this is that when we discover dead TIDs, we only
> record them in memory, not on disk. So, as soon as VACUUM ends, we
> lose all knowledge of whether those TIDs were and must rediscover
> them. Suppose we didn't do this, and instead had a "dead TID" fork for
> each table. Suppose further that this worked like a conveyor belt,
> similar to WAL, where every dead TID we store into the fork is
> assigned an identifying 64-bit number that is never reused.

Have you started any work on this project? I think that it's a very good idea.

Enabling index-only scans is a good enough reason to pursue this
project, even on its own. The flexibility that this design offers
allows VACUUM to run far more aggressively, with little possible
downside. It makes it possible for VACUUM to run so frequently that it
rarely dirties pages most of the time -- at least in many important
cases. Imagine if VACUUM almost kept in lockstep with inserters into
an append-mostly table -- that would be great. The main blocker to
making VACUUM behave like that is of course indexes.

Setting visibility map bits during VACUUM can make future vacuuming
cheaper (for the obvious reason), which *also* makes it cheaper to set
*most* visibility map bits as the table is further extended, which in
turn makes future vacuuming cheaper...and so on. This virtuous circle
seems like it might be really important. Especially once you factor in
the cost of dirtying pages a second or a third time. I think that we
can really keep the number of times VACUUM dirties pages under
control, simply by decoupling. Decoupling is key to keeping the costs
to a minimum.

I attached a POC autovacuum logging instrumentation patch that shows
how VACUUM uses *and* sets VM bits. I wrote this for my TPC-C + FSM
work. Seeing both things together, and seeing how both things *change*
over time was a real eye opener for me: it turns out that the master
branch keeps setting and resetting VM bit pages in the two big
append-mostly tables that are causing so much trouble for Postgres
today. What we see right now is pretty disorderly -- the numbers don't
trend in the right direction when they should. But it could be a lot
more orderly, with a little work.

This instrumentation helped me to discover a better approach to
indexing within TPC-C, based on index-only scans [1]. It also made me
realize that it's possible for a table to have real problems with dead
tuple cleanup in indexes, while nevertheless being an effective target
for index-only scans. There is actually no good reason to think that
one condition should preclude the other -- they may very well go
together. You did say this yourself when talking about global indexes,
but there is no reason to think that it's limited to partitioning
cases. The current "ANALYZE dead_tuples statistics" paradigm cannot
recognize when both conditions go together, even though I now think
that it's fairly common. I also like your idea here because it enables
a more qualitative approach, based on recent information for recently
modified blocks -- not whole-table statistics. Averages are
notoriously misleading.

[1] https://github.com/pgsql-io/benchmarksql/pull/16
--
Peter Geoghegan

Attachment Content-Type Size
0001-Instrument-pages-skipped-by-VACUUM.patch application/octet-stream 13.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-09-15 22:31:20 Re: Estimating HugePages Requirements?
Previous Message Andres Freund 2021-09-15 21:40:19 Re: Hook for extensible parsing.