Re: tool for incrementally shrinking bloated tables

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Paul Tillotson <pntil(at)shentel(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: tool for incrementally shrinking bloated tables
Date: 2004-12-22 16:39:20
Message-ID: 20041222163920.GD29278@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I like the idea generally, just some comments:

On Tue, Dec 21, 2004 at 07:09:39PM -0500, Paul Tillotson wrote:
> (1) VACUUM KEEP_EARLY_FREE_PAGES mybloatedtable; -- item (a)

This may not be necessary anyway, but could improve performance.

> (2) UPDATE mybloatedtable SET foo = foo WHERE ctid > '(nnnnn, 0)';
> --move tuples in end of the table to the front.

This is a neat way ot acheive the moving safely, but like you said
you'd need to invent the appropriate operator. And make sure it does
get put near the beginning of the table.

You have to make sure the tuple is really dead and you have to make
sure you update the index at the same time, although maybe dead tuples
are already excluded from the index.

> (3) SHRINK TABLE mybloatedtable; -- item (b)

I assume you're going to need a EXCLUSIVE table lock here. You don't
want this happening:

Process 1: Look at table to find last active tuple
Process 2: Insert tuple at end of table
Process 1: ftruncate

Whoops, you just blatted the new tuple.

That said, it wouldn't be too hard to write something like:

1. LOCK EXCLUSIVE table (actually allowing reads would be ok)
2. Start scanning from 100KB from end of file and note last used page
3. ftruncate

> Then repeat as many times as necessary to accomplish the desired shrinking.

I'm wondering about the value of all this anyway, the number of places
this would be useful doesn't seem especially large anyway. Essentially
when you've stuffed up the FSM configuration...

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message A. Mous 2004-12-22 16:40:57 Re: 8.0 Beta3 worked, RC1 didn't!
Previous Message John Cunningham 2004-12-22 16:38:01 Re: postgresql.conf