Re: vacuum, performance, and MVCC

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-25 18:52:44
Message-ID: 200606251852.k5PIqih25183@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On Sat, 24 Jun 2006, Bruce Momjian wrote:
>
> > Because having them be on the same page is the only way you can update
> > the page item pointer so when you recycle the row, you the indexes are
> > now pointing to the new version. Pages look like:
> >
> > [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
> >
> > and indexes only point to items, not to tuples. This allows tuples to
> > be compacted on the page without affecting the indexes.
> >
> > If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
> > backends, you can modify item1 to point to tuple2, and you can mark the
> > space used by tuple1 as reusable:
> >
> > [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]
>
> Ok, now I think I get it. So the limitation of old and new tuple being on
> the same page is required to make it possible to remove the old tuple
> without touching the indexes?

Yes, modifying the heap page item pointer is required for reuse.

> If you move the new tuple (logically, by modifying item pointers) on
> vacuum, isn't there a risk that a concurrent seqscan misses it?

Well, you lock the page while updating the item pointer. Because the
versions are on the same page, a single page lock should be fine.

> > If you can't expire the old row because one of the indexed columns was
> > modified, I see no reason to try to reduce the additional index entries.
>
> It won't enable early expiration, but it means less work to do on update.
> If there's a lot of indexes, not having to add so many index tuples can be
> a significant saving.

Already added to TODO.

* Reuse index tuples that point to heap tuples that are not visible to
anyone?

> To summarise, we have two issues related to frequent updates:
> 1. Index and heap bloat, requiring frequent vacuum.
> 2. Updates are more expensive than on other DBMSs, because we have to add
> a new index tuple in every index, even if none of the indexed columns are
> modified.
>
> Tom suggested that we just improve vacuum and autovacuum, and someone
> brought up the dead space map idea again. Those are all worthwhile things
> to do and help with vacuuming after deletes as well as updates, but they
> only address issue 1. Mark's suggestion (assuming that it would've
> worked) as well as yours address both, but only for updates.

Agreed.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2006-06-25 19:21:27 Re: vacuum, performance, and MVCC
Previous Message Bruce Momjian 2006-06-25 18:24:02 Re: vacuum, performance, and MVCC