Re: vacuum, performance, and MVCC

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-25 19:42:01
Message-ID: 1151264521.4096.14.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian:
> Jan Wieck wrote:
> > >> Sure, but index reuse seems a lot easier, as there is nothing additional
> > >> to remember or clean out when doing it.
> > >
> > > Yes, seems so. TODO added:
> > >
> > > * Reuse index tuples that point to heap tuples that are not visible to
> > > anyone?
> > >
> > >> When reusing a heap tuple you have to clean out all index entries
> > >> pointing to it.
> > >
> > > Well, not for UPDATE for no key changes on the same page, if we do that.
> > >
> >
> > An update that results in all the same values of every indexed column of
> > a known deleted invisible tuple. This reused tuple can by definition not
> > be the one currently updated. So unless it is a table without a primary
> > key, this assumes that at least 3 versions of the same row exist within
> > the same block. How likely is that to happen?
>
> Good question. You take the current tuple, and make another one on the
> same page. Later, an update can reuse the original tuple if it is no
> longer visible to anyone (by changing the item id), so you only need two
> tuples, not three. My hope is that a repeated update would eventually
> move to a page that enough free space for two (or more) versions.

I can confirm that this is exactly what happens when running an
update-heavy load with frequent vacuums. Eventually most rows get their
own db pages or share the same page with 2-3 rows. And there will be
lots of unused (filed up, or cleaned and not yet reused) pages.

The overall performance could be made a little better by tuning the
system to not put more than N new rows on the same page at initial
insert or when the row move to a new page during update. Currently
several new rows are initially put on the same page and then move around
during repeated updates until they slow(ish)ly claim their own page.

> Does that help explain it?
>
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-06-25 19:51:28 Re: vacuum, performance, and MVCC
Previous Message Jan Wieck 2006-06-25 19:21:27 Re: vacuum, performance, and MVCC