Re: vacuum, performance, and MVCC

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-26 14:50:26
Message-ID: 200606261450.k5QEoQh03466@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing wrote:
> ?hel kenal p?eval, E, 2006-06-26 kell 14:56, kirjutas Martijn van
> Oosterhout:
> > On Mon, Jun 26, 2006 at 07:17:31AM -0400, Bruce Momjian wrote:
> > > Correct! We use the same pointers used by normal UPDATEs, except we set
> > > a bit on the old tuple indicating it is a single-index tuple, and we
> > > don't create index entries for the new tuple. Index scan routines will
> > > need to be taught about the new chains, but because only one tuple in
> > > the chain is visible to a single backend, the callers should not need to
> > > be modified.
> >
> > I suppose we would also change the index_getmulti() function to return
> > a set of ctids plus flags so the caller knows to follow the chains,
> > right?
>
> It is probably better to always return the pointer to the head of CITC
> chain (the one an index points to) and do extra visibility checks and
> chain-following on each access. This would keep the change internal to
> tuple fetching functions.

So index_getnext() traverses the chain and returns one member per call.
Makes sense. Just realize you are in a single index entry returning
multiple tuples. We will need some record keeping to track that.

> > And for bitmap index scans you would only remember the page in
> > the case of such a tuple, since you can't be sure the exact ctid you've
> > got is the one you want.
>
> no, you should only use the pointer to CITC head outside tuple access
> funtions. And this pointer to CITC head is what is always passed to
> those access functions/macros.
>
> The VACUUM would run its passes thus:
>
> pass 1: run over heap, collect pointers to single dead tuples, and fully
> dead CITC chains (fully dead = no live tuples on this page). Clean up
> old tuples from CITC chains and move live tuples around so that CITC
> points to oldest possibly visible (not vacuumed) tuple. Doing this there
> frees us from need to collect a separate set of pointers for those. Or
> have you planned that old tuples from CITC chains are collected on the
> go/as needed ? Of course we could do both.

Non-visible CITC members should be freed during an UPDATE on the same
page, so vacuum doesn't have to be involved.

> pass 2: clean indexes based on ctid from pass 1
>
> pass 3: clean heap based on ctid from pass 1
>
> If yo do it this way, you dont need to invent new data structures to
> pass extra info about CITC internals to passes 2 and 3
>
> On more thing - when should free space map be notified about free space
> in pages with CITC chains ?

Uh, well, I am thinking we only free CITC space when we are going to use
it for an UPDATE, rather than free things while doing an operation. It
is good to keep the cleanup overhead out of the main path as much as
possible.

Also, seems I can't spell algorithms very well:

Definition: Single-Index-Tuple Chain (SITC)
-
Thinking of vacuum, right now it does these cleanups:

o non-visible UPDATEs on the same page with no key changes
o non-visible UPDATEs on the same page with key changes
o non-visible UPDATEs on different pages
o DELETEs
o aborted transactions

The big question is what percentage of dead space is the first one? My
guess is 65%.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-06-26 14:56:59 Re: "Truncated" tuples for tuple hash tables
Previous Message Martijn van Oosterhout 2006-06-26 14:48:57 Re: "Truncated" tuples for tuple hash tables