Re: vacuum, performance, and MVCC

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-25 18:13:48
Message-ID: Pine.OSF.4.61.0606250954090.124217@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 24 Jun 2006, Bruce Momjian wrote:

> Because having them be on the same page is the only way you can update
> the page item pointer so when you recycle the row, you the indexes are
> now pointing to the new version. Pages look like:
>
> [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
>
> and indexes only point to items, not to tuples. This allows tuples to
> be compacted on the page without affecting the indexes.
>
> If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
> backends, you can modify item1 to point to tuple2, and you can mark the
> space used by tuple1 as reusable:
>
> [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]

Ok, now I think I get it. So the limitation of old and new tuple being on
the same page is required to make it possible to remove the old tuple
without touching the indexes?

If you move the new tuple (logically, by modifying item pointers) on
vacuum, isn't there a risk that a concurrent seqscan misses it?

> If you can't expire the old row because one of the indexed columns was
> modified, I see no reason to try to reduce the additional index entries.

It won't enable early expiration, but it means less work to do on update.
If there's a lot of indexes, not having to add so many index tuples can be
a significant saving.

To summarise, we have two issues related to frequent updates:
1. Index and heap bloat, requiring frequent vacuum.
2. Updates are more expensive than on other DBMSs, because we have to add
a new index tuple in every index, even if none of the indexed columns are
modified.

Tom suggested that we just improve vacuum and autovacuum, and someone
brought up the dead space map idea again. Those are all worthwhile things
to do and help with vacuuming after deletes as well as updates, but they
only address issue 1. Mark's suggestion (assuming that it would've
worked) as well as yours address both, but only for updates.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-06-25 18:24:02 Re: vacuum, performance, and MVCC
Previous Message Jan Wieck 2006-06-25 17:30:45 Re: vacuum, performance, and MVCC