Re: Dead Space Map for vacuum

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dead Space Map for vacuum
Date: 2006-12-29 22:49:49
Message-ID: 1167432590.3903.301.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2006-12-30 at 09:22 +1100, Russell Smith wrote:
> Simon Riggs wrote:
> > FSM code ignores any block with less space than 1 average tuple, which
> > is a pretty reasonable rule.
> >
> FSM serves a different purpose than DSM and therefore has an entirely
> different set of rules governing what it should and shouldn't be
> doing. This is a reasonable rule for FSM, but not for DSM.

VACUUM and space reuse are intimately connected. You cannot consider one
without considering the other.

> > If you only track whether a block has been updated, not whether it has
> > been updated twice, then you will be VACUUMing lots of blocks that have
> > only a 50% chance of being usefully stored by the FSM. As I explained,
> > the extra bit per block is easily regained from storing less FSM data.
> >
> Well, it seems that when implementing the DSM, it'd be a great time to
> move FSM from it's current location in Shared Memory to somewhere
> else. Possibly the same place as DSM. A couple of special blocks per
> file segment would a good place. Also I'm not sure that the point of
> VACUUMing is always to be able be able to immediately reuse the space.
> There are cases where large DELETE's are done, and you just want to
> decrease the index size.

I can see the argument in favour of reducing index size, but do we want
to perform a read *and* a write IO to remove *one* heap tuple, just so
we can remove a single tuple from index(es)?

We might want to eventually, but I'm proposing keeping track of that so
that we can tell the difference between single dead tuples and more
efficient targets for our VACUUM. When there are no better targets,
sure, we'll be forced to reach for the high fruit.

The idea of DSM is to improve the efficiency of VACUUM by removing
wasted I/Os (and again I say, I back DSM). The zero-dead-tuples blocks
are obvious targets to avoid, but avoiding them only avoids one read
I/O. Avoiding the single-dead-tuple blocks avoids two I/Os, at small
loss. They are poor targets for a selective VACUUM.

When we are picking just some of the blocks in a table, we will quickly
move from sequential to random I/O, so we must be careful and
conservative about the blocks we pick, otherwise DSM VACUUM will not be
any better than VACUUM as it is now.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bill Moran 2006-12-29 22:53:38 Logging temp file useage ... a little advice would be appreciated
Previous Message Russell Smith 2006-12-29 22:22:37 Re: Dead Space Map for vacuum