Re: Allow single table VACUUM in transaction block

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow single table VACUUM in transaction block
Date: 2022-11-06 20:40:30
Message-ID: CAH2-WznnWomgJLXj-VQ2m1LZqArBtRsNuL09E0Bg9Wx_memu7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 6, 2022 at 11:14 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> In general, I do not believe in encouraging users to run VACUUM
> manually in the first place. We would be far better served by
> spending our effort to improve autovacuum's shortcomings.

I couldn't agree more. A lot of problems seem related to the idea that
VACUUM is just a command that the DBA periodically runs to get a
predictable fixed result, a little like CREATE INDEX. That conceptual
model isn't exactly wrong; it just makes it much harder to apply any
kind of context about the needs of the table over time. There is a
natural cycle to how VACUUM (really autovacuum) is run, and the
details matter.

There is a significant amount of relevant context that we can't really
use right now. That wouldn't be true if VACUUM only ran within an
autovacuum worker (by definition). The VACUUM command itself would
still be available, and support the same user interface, more or less.
Under the hood the VACUUM command would work by enqueueing a VACUUM
job, to be performed asynchronously by an autovacuum worker. Perhaps
the initial enqueue operation could be transactional, fixing Simon's complaint.

"No more VACUUMs outside of autovacuum" would enable more advanced
autovacuum.c scheduling, allowing us to apply a lot more context about
the costs and benefits, without having to treat manual VACUUM as an
independent thing. We could coalesce together redundant VACUUM jobs,
suspend and resume VACUUM operations, and have more strategies to deal
with problems as they emerge.

> I'd like to see some sort of direct attack on its inability to deal
> with temp tables, for instance. (Force the owning backend to
> do it? Temporarily change the access rules so that the data
> moves to shared buffers? Dunno, but we sure haven't tried hard.)

This is a good example of the kind of thing I have in mind. Perhaps it
could work by killing the backend that owns the temp relation when
things truly get out of hand? I think that that would be a perfectly
reasonable trade-off.

Another related idea: better behavior in the event of a manually
issued VACUUM (now just an enqueued autovacuum) that cannot do useful
work due to the presence of a long running snapshot. The VACUUM
doesn't have to dutifully report "success" when there is no practical
sense in which it was successful. There could be a back and forth
conversation between autovacuum.c and vacuumlazy.c that makes sure
that something useful happens sooner or later. The passage of time
really matters here.

As a bonus, we might be able to get rid of the autovacuum GUC
variants. Plus the current autovacuum logging would just be how we'd
log every VACUUM.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-11-06 22:38:42 Re: O(n) tasks cause lengthy startups and checkpoints
Previous Message Tom Lane 2022-11-06 20:12:52 Re: Free list same_input_transnos in preprocess_aggref