Re: should vacuum's first heap pass be read-only?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-05 12:44:27
Message-ID: CA+TgmoY=Mikisws=wht01enANyTJ58DGvSC6Ubrpn5TET2Wnrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 6:19 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I don't think that saving extra index passes is really a small gain.
> I think this will save a lot of IO if indexes pages are not in shared
> buffers because here we are talking about we can completely avoid the
> index passes for some of the indexes if it is already done. And if
> this is the only advantage then it might not be worth adding this
> infrastructure but what about global indexes?

Sure, I agree that the gain is large when the situation arises -- but
in practice I think it's pretty rare that the dead TID array can't fit
in maintenance_work_mem. In ten years of doing PostgreSQL support,
I've seen only a handful of cases where # of index scans > 1, and
those were solved by just increasing maintenance_work_mem until the
problem went away. AFAICT, there's pretty much nobody who can't fit
the dead TID list in main memory. They just occasionally don't
configure enough memory for it to happen. It makes sense if you think
about the math. Say you run with maintenance_work_mem=64MB. That's
enough for 10 million dead TIDs. With default settings, the table
becomes eligible for vacuuming when the number of updates and deletes
exceeds 20% of the table. So to fill up that amount of memory, you
need the table to have more than 50 million tuples. If you estimate
(somewhat randomly) 100 tuples per page, that's 5 million pages, or
40GB. If you have a 40GB table, you don't have a problem with using
64MB of memory to vacuum it. And similarly if you have a 640GB table,
you don't have a problem with using 1GB of memory to vacuum it.
Practically speaking, if we made work memory for autovacuum unlimited,
and allocated on demand as much as we need, I bet almost nobody would
have an issue.

> Because if we have global indexes then we must need this
> infrastructure to store the dead items for the partition because for
> example after vacuuming 1000 partitions while vacuuming the 1001st
> partition if we need to vacuum the global index then we don't want to
> rescan all the previous 1000 partitions to regenerate those old dead
> items right? So I think this is the actual use case where we
> indirectly skip the heap vacuuming for some of the partitions before
> performing the index vacuum.

Well I agree. But the problem is what development path we should
pursue in terms of getting there. We want to do something that's going
to make sense if and when we eventually get global indexes, but which
is going to give us a good amount of benefit in the meanwhile, and
also doesn't involve having to make too many changes to the code at
the same time. I liked the idea of keeping VACUUM basically as it is
today -- two heap passes with an index pass in the middle, but now
with the conveyor injected -- because it keeps the code changes as
simple as possible. And perhaps we should start by doing just that
much. But now that I've realized that the benefit of doing only that
much is so little, I'm a lot less convinced that it is a good first
step. Any hope of getting a more significant benefit out of the
conveyor belt stuff relies on our ability to get more decoupling, so
that we for example collect dead TIDs on Tuesday, vacuum the indexes
on Wednesday, and set the dead TIDs unused on Thursday, doing other
things meanwhile.

And from that point of view I see two problems. One problem is that I
do not think we want to force all vacuuming through the conveyor belt
model. It doesn't really make sense for a small table with no
associated global indexes. And so then there is a code structure
issue: how do we set things up so that we can vacuum as we do today,
or alternatively vacuum in completely separate stages, without filling
the code up with a million "if" statements? The other problem is
understanding whether it's really feasible to postpone the index
vacuuming and the second heap pass in realistic scenarios. Postponing
index vacuuming and the second heap pass means that dead line pointers
remain in the heap, and that can drive bloat via line pointer
exhaustion. The whole idea of decoupling table and index vacuum
supposes that there are situations in which it's worth performing the
first heap pass where we gather the dead line pointers but where it's
not necessary to follow that up as quickly as possible with a second
heap pass to mark dead line pointers unused. I think Peter and I are
in agreement that there are situations in which some indexes need to
be vacuumed much more often than others -- but that doesn't matter if
the heap needs to be vacuumed more frequently than anything else,
because you can't do that without first vacuuming all the indexes.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-04-05 12:54:35 Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?
Previous Message Thom Brown 2022-04-05 12:40:47 Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery