Re: Allow single table VACUUM in transaction block

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow single table VACUUM in transaction block
Date: 2022-11-07 08:20:32
Message-ID: CANbhV-EV=Xwhz+nZ+1qwbFOQWgEi-z9drRgzJeETpc02wOVjsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 6 Nov 2022 at 20:40, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Sun, Nov 6, 2022 at 11:14 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > In general, I do not believe in encouraging users to run VACUUM
> > manually in the first place. We would be far better served by
> > spending our effort to improve autovacuum's shortcomings.
>
> I couldn't agree more. A lot of problems seem related to the idea that
> VACUUM is just a command that the DBA periodically runs to get a
> predictable fixed result, a little like CREATE INDEX. That conceptual
> model isn't exactly wrong; it just makes it much harder to apply any
> kind of context about the needs of the table over time. There is a
> natural cycle to how VACUUM (really autovacuum) is run, and the
> details matter.
>
> There is a significant amount of relevant context that we can't really
> use right now. That wouldn't be true if VACUUM only ran within an
> autovacuum worker (by definition). The VACUUM command itself would
> still be available, and support the same user interface, more or less.
> Under the hood the VACUUM command would work by enqueueing a VACUUM
> job, to be performed asynchronously by an autovacuum worker. Perhaps
> the initial enqueue operation could be transactional, fixing Simon's complaint.

Ah, I see you got to this idea first!

Yes, what we need is for the "VACUUM command" to not fail in a script.
Not sure anyone cares where the work takes place.

Enqueuing a request for autovacuum to do that work, then blocking
until it is complete would do the job.

> "No more VACUUMs outside of autovacuum" would enable more advanced
> autovacuum.c scheduling, allowing us to apply a lot more context about
> the costs and benefits, without having to treat manual VACUUM as an
> independent thing. We could coalesce together redundant VACUUM jobs,
> suspend and resume VACUUM operations, and have more strategies to deal
> with problems as they emerge.

+1, but clearly this would not make temp table VACUUMs work.

> > I'd like to see some sort of direct attack on its inability to deal
> > with temp tables, for instance. (Force the owning backend to
> > do it? Temporarily change the access rules so that the data
> > moves to shared buffers? Dunno, but we sure haven't tried hard.)

This was a $DIRECT attack on making temp tables work! ;-)

Temp tables are actually easier, since we don't need any of the
concurrency features we get with lazy vacuum. So the answer is to
always run a VACUUM FULL on temp tables since this skips any issues
with indexes etc..

We would need to check a few things first.... maybe something like
this (mostly borrowed heavily from COPY)

InvalidateCatalogSnapshot();
if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
ereport(WARNING,
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
errmsg("vacuum of temporary table ignored because
of prior transaction activity")));
CheckTableNotInUse(rel, "VACUUM");

> This is a good example of the kind of thing I have in mind. Perhaps it
> could work by killing the backend that owns the temp relation when
> things truly get out of hand? I think that that would be a perfectly
> reasonable trade-off.

+1

> Another related idea: better behavior in the event of a manually
> issued VACUUM (now just an enqueued autovacuum) that cannot do useful
> work due to the presence of a long running snapshot. The VACUUM
> doesn't have to dutifully report "success" when there is no practical
> sense in which it was successful. There could be a back and forth
> conversation between autovacuum.c and vacuumlazy.c that makes sure
> that something useful happens sooner or later. The passage of time
> really matters here.

Regrettably, neither vacuum nor autovacuum waits for xmin to change;
perhaps it should.

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karina Litskevich 2022-11-07 08:55:32 Re: Error for WITH options on partitioned tables
Previous Message Peter Smith 2022-11-07 08:16:37 Re: Perform streaming logical transactions by background workers and parallel apply