Re: Dead Space Map

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dead Space Map
Date: 2006-02-28 10:38:51
Message-ID: Pine.OSF.4.61.0602280908490.76425@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 27 Feb 2006, Tom Lane wrote:

> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>> On Mon, 27 Feb 2006, Tom Lane wrote:
>>> This strikes me as a fairly bad idea, because it makes VACUUM dependent
>>> on correct functioning of user-written code --- consider a functional
>>> index involving a user-written function that was claimed to be immutable
>>> and is not.
>
>> If the user-defined function is broken, you're in more or less trouble
>> anyway.
>
> Less. A non-immutable function might result in lookup failures (not
> finding the row you need) but not in database corruption, which is what
> would ensue if VACUUM fails to remove an index tuple. The index entry
> would eventually point to a wrong table entry, after the table item slot
> gets recycled for another tuple.

It's easy to detect when it happens (that you don't find the row in the
index). You can then complain loudly or fall back to the full scan.

> Moreover, you haven't pointed to any strong reason to adopt this
> methodology. It'd only be a win when vacuuming pretty small numbers
> of tuples, which is not the design center for VACUUM, and isn't likely
> to be the case in practice either if you're using autovacuum. If you're
> removing say 1% of the tuples, you are likely to be hitting every index
> page to do it, meaning that the scan approach will be significantly
> *more* efficient than retail lookups.

It really depends a lot on what kind of indexes the table has, clustering
order, and what kind of deletions and updates happen. Assuming a random
distribution of dead tuples, a scan is indeed going to be more efficient
as you say.

Assuming a tightly packed bunch of dead tuples, however, say after a
"delete from log where timestamp < now() - 1 month", you would benefit
from a partial vacuum.

That's certainly something that needs to be tested. It's important to
implement so that it falls back to full scan when it's significantly
faster.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hans-Juergen Schoenig 2006-02-28 12:02:04 Re: new feature: LDAP database name resolution
Previous Message Hannu Krosing 2006-02-28 08:17:41 Re: Dead Space Map