Proposal for background vacuum full/cluster

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proposal for background vacuum full/cluster
Date: 2005-04-20 22:05:56
Message-ID: 20050420220556.GU58835@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I talked to a few people on IRC about this and they didn't think I was
nuts, so maybe this is something practical...

In a nutshell, my idea is to use the normal transactional/XID code to
relocate tuples in the heap. Think of doing an UPDATE field=field if you
could tell update what page to put the new tuple on. Using this
mechanism, you can move tuples from the end of the heap to pages that
have free space on them. The dead tuples at the end of the heap could
then be vacuumed conventionally, and completely empty pages removed by
that vacuum.

Of course, it's not quite that simple. For starters, you'd want to do a
conventional vacuum before this, both to free as much space as possible
and to update the FSM. It might also be necessary to prevent backends
from using the pages at the end of the heap (which you're trying to
empty). I'm guessing that could be done just by removing the pages from
the FSM. You'd also need to vacuum after emptying these pages to reclaim
the disk space. To facilitate these things, it might be useful to be
able to vacuum parts of the heap. So as pages are emptied at the end of
the heap, they can be vacuumed and reclaimed while the pages are still
probably in cache (and without requiring a re-vacuum of the entire
table).

Taking this technique one step further, it should also be possible to
cluster in the background without blocking everything. One way to do
this would be to empty the first page in the heap by moving it's tuples
elsewhere, and vacuuming that page (but not putting it in the FSM). Once
that page is available, you can start reading in from the clustering
index and moving those tuples to the first page.

One thing that might be an issue for both ideas is index bloat. But
since reindex is a non-blocking operation, it doesn't seem unreasonable
to either do that automatically or have the user do it.

Is this TODOable?
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2005-04-20 22:08:30 Re: Postgres: pg_hba.conf, md5, pg_shadow, encrypted passwords
Previous Message Tom Lane 2005-04-20 22:03:18 Re: Postgres: pg_hba.conf, md5, pg_shadow, encrypted passwords