Re: Feedback on getting rid of VACUUM FULL

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feedback on getting rid of VACUUM FULL
Date: 2009-09-16 18:37:13
Message-ID: 1253126233.778.26.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2009-09-16 at 11:10 -0700, Josh Berkus wrote:
> Hackers,
>
> Here's the feedback on replacing VACUUM FULL with VACUUM REWRITE:
> http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959
>
> Of note:
>
> a) To date, I have yet to hear a single person bring up an actual
> real-life use-case where VACUUM FULL was desireable and REWRITE would
> not be.

The only case is when you are out of disk space and can't afford to
write out a full set of live rows.

What I'd like to propose is VACUUM FULL CONCURRENTLY, which similar to
VACUUM CONCURRENTLY, would actually do the compaction phase, that is,
move simultaneously from two directions, from start, to find empty space
and from end to find tuples. for each sufficiently large empty space the
forward scan finds it would take one or more tuples from the reverse
scan and then "null update" those to the empty space found by the
free-space-scan beginning. it should do that in small chunks, say one
page at a time, so it will minimally interfere with OLTP loads.

Once these two scans meet, you can stop and either run an non full
vacuum, or just continue in similar fashion to non-full vacuum and do
the cleanups of indexes and heap.

You may need to repeat this a few times to get actual shrinkage but it
has the very real advantage of being usable on 24/7 systems, which
neither VACUUM FULL nor CLUSTER possess.

At some point I actually had external scripts doing similar stuff for
on-line table shrinking, the only difference being that I could not move
the tuple towards beginning right away (pg preferred in-page updates)
and had to keep doing null updates (id=id where id) until the page
number in ctid changed.

> Lots of people have said something hypothetical, but nobody has
> come forward with a "I have this database X and several times Y
> happened, and only FULL would work ...". This makes me think that there
> very likey are no actual use cases where we need to preserve FULL.
>
> b) Several people have strongly pushed for a phased removal of FULL over
> more than one PG version, with a warning message about depreciation.
>
> c) Vivek had some points about required implementation:
>
> "However, there still must be a way to compact the tables that is mvcc
> safe. From what I have read and recall, cluster is not. Thus, the vacuum
> rewrite would be a mandatory feature (or cluster could be made mvcc safe)."
>
> Is Vivek correct about this? News to me ...

It used to be true at some point, probably not true any more.

IIRC, the problem was, that old table was not locked during rewrite and
thus some code could be updating the old heap even while the data had
been muved to the new one.

> --
> Josh Berkus
> PostgreSQL Experts Inc.
> www.pgexperts.com
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2009-09-16 18:40:09 Re: Feedback on getting rid of VACUUM FULL
Previous Message Michael Glaesemann 2009-09-16 18:34:17 Re: PATCH: make plpgsql IN args mutable (v1) [REVIEW]