Space reuse and autovacuum

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Space reuse and autovacuum
Date: 2005-06-21 20:06:52
Message-ID: 200506212006.j5LK6qR23759@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Gavin Sherry wrote:
> On Wed, 15 Jun 2005, Bruce Momjian wrote:
>
> >
> > I am going to start working on it. I am concerned it is a big job.
> >
> > I will post questions as I find them, and the one below is a good one.
> >
>
> I'm wondering if effort is being misdirected here. I remember when Mark
> Wong at OSDL was running pg_autovacuum during a dbt run, he was seeing
> significant performance loss -- I think on the order of 30% to 40% (I will
> try and dig up a link to the results).
>
> I think these results can be dramatically improved if the focus is on a
> more effective vacuum.

Let's look at what TODO has for vacuum and how autovacuum fits that:

Vacuum
======

* Improve speed with indexes

For large table adjustements during vacuum, it is faster to reindex
rather than update the index.

This is something we should figure out how to do automatically.

* Reduce lock time by moving tuples with read lock, then write
lock and truncate table

Moved tuples are invisible to other backends so they don't require a
write lock. However, the read lock promotion to write lock could lead
to deadlock situations.

The deadlock problem here seems bad.

* -Add a warning when the free space map is too small

Done.

* Maintain a map of recently-expired rows

This allows vacuum to target specific pages for possible free space
without requiring a sequential scan.

I think of this as a secondary Free-space-map (FSM), where instead of
recording rows/pages that have free space, we records rows/pages that
have expired rows that might be free for reuse if all transactions where
the are visible are completed.

* Auto-fill the free space map by scanning the buffer cache or by
checking pages written by the background writer

This could be used to populate the secondary FSM above.

* Create a bitmap of pages that need vacuuming

Instead of sequentially scanning the entire table, have the background
writer or some other process record pages that have expired rows, then
VACUUM can look at just those pages rather than the entire table. In
the event of a system crash, the bitmap would probably be invalidated.

This is an alternative to the FSM that tracks _all_ possible free space
rather than just a limited amount like a seconary FSM in shared memory.

* Auto-vacuum
o Move into the backend code
o Use free-space map information to guide refilling
o Do VACUUM FULL if table is nearly empty?

It seems no matter what TODO items we complete above, we will need some
type of automatic vacuum to direct filling the free space map. It might
be done using a different method than a sequential scan vacuum, but it
will be needed, so we are good to integrate autovacuum then improve how
it does its job in future releases.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2005-06-21 20:34:48 Removing Kerberos 4
Previous Message Sean Davis 2005-06-21 19:43:57 Re: Debugging PL/pgSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-06-21 20:14:07 Re: Function's LEAST, GREATEST and DECODE (Oracle vararg
Previous Message Simon Riggs 2005-06-21 19:56:13 Strange logic for partial index proving