tool for incrementally shrinking bloated tables

From: Paul Tillotson <pntil(at)shentel(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: tool for incrementally shrinking bloated tables
Date: 2004-12-20 22:20:31
Message-ID: 41C7502F.7020905@shentel.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Goal: on a prduction server, to gradually shrink a table (no matter how
large) back to < 10% free space without "noticeably" interrupting write
access to it. ("noticeably" = without taking any exclusive locks for
more than a few seconds at a time.)

I am thinking about making this if it proves to be not to difficult.

To accomplish this, tuples need to be moved into free space in the
beginning of the table, and the table must be shrunk using ftruncate().

It seems that I could make these two changes:

(a) Modifying the VACUUM command to take an option that means "pack the
free space map with the pages that occur earliest in
the table rather than the pages with the most free space."

(b) Create a command that will take an exclusive lock, scan a table
backwards until it comes to a tuple that cannot be removed (i.e., a
tuple that is not HEAPTUPLE_DEAD (see scan_heap() in
src/backend/commands/vacuum.c)) or until some preset amount of time has
elapsed, and then ftruncate() the table.

To use this system one would do this:

(1) VACUUM KEEP_EARLY_FREE_PAGES mybloatedtable; -- use
item (a) discussed above
(2) UPDATE mybloatedtable SET foo = foo WHERE ctid > '(nnnnn, 0)'; --
move tuples in end of the table to the front.
(3) SHRINK TABLE
mybloatedtable;
-- use item (b) discussed above

Then repeat as many times as necessary to accomplish the desired shrinking.

In defense of the need for this tool: Although this is usually
preventable by proper vacuuming and FSM configuration, often on the list
I see people say that they have a "huge" multi-gigabyte table that is
using up all their drive space, but they cannot afford the interruption
that VACUUM FULL would entail. Also, certain maintenance operations
(e.g., adding a column and populating it within a transaction) can
double the on-disk size of a table, not to mention user error such as
running an unconstrained UPDATE command inside a transaction and then
rolling it back.

Comments? Am I missing some obvious way of accomplishing this goal? Is
anyone working on something like this?

Paul Tillotson

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-12-20 23:22:55 Re: Heads up: RC2 this evening
Previous Message Andrew Dunstan 2004-12-20 21:47:52 Re: Heads up: RC2 this evening