Re: HOT documentation README

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT documentation README
Date: 2007-09-12 06:49:31
Message-ID: 2e78013d0709112349k111dc1den7d49cb30d34599f8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On 9/12/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Current README.HOT version:
> > ftp://momjian.us/pub/postgresql/mypatches/README.HOT
>
> I've done some further editorializing and wordsmithing on this,
> and marked with XXX some areas that seem like they're still red flags
> (or at least inadequately described).

Thanks a lot.

> CREATE INDEX CONCURRENTLY
> -------------------------
>
> In the concurrent case we take a different approach. We create the
> pg_index entry immediately, before we scan the table. The pg_index entry
> is marked as "not ready for inserts". Then we commit and wait for any
> transactions which have the table open to finish. This ensures that no
> *new* HOT updates will change the key value for our new index, because all
> transactions will see the existence of the index and will respect its
> constraint on which updates can be HOT.
>
> After waiting for transactions which had the table open, we build the
> index with a snapshot. Because we waited for anyone who existed before
> the index was created, any tuples seen in the snapshot will have only
> valid forward-growing HOT chains. (They might still have older HOT
> updates behind them which are broken.) As above, we point the index
> pointer at the root of the HOT-update chain but we use the key from the
> end of the chain.
>
> We mark the index open for inserts (but still not ready for reads) then
> we again wait for transactions which have the table open. Then we take
> a second reference snapshot and validate the index. This searches for
> tuples missing from the index --- but it again has to look up the root
> of the HOT chains and search for those tuples in the index.
>
> Then we wait until every transaction that could have a snapshot older than
> the second reference snapshot is finished. This ensures that nobody is
> alive any longer who could possibly see any of the older tuples in a
> broken HOT chain. (This condition is actually stronger than what is
> needed to protect the broken HOT chains --- the problem we are dealing
> with is tuples inserted after we took our first reference snapshot.
> Those will be part of valid HOT-chains, but might not be in the index.)
>
> XXX the above being the case, why should we bother with all this extra
> mechanism? Seems we could dispense with "not ready for inserts".
>
>

The "not ready for inserts" logic is necessary because we have separated
the steps of creating the catalog entry for the new index and actual
bottom-up
building of the index into separate transactions. My worry was once the
catalog
entry is committed, an index insert won't work with the concurrent index
build.

I guess I also need to explain why we need to separate the catalog entry
creation and index building activities into separate transactions. The
primary
purpose is to handle existing broken HOT chains and avoid any new
broken chains being created while we are building the index.

Without this additional step, while we are building the index in the first
phase
using a snapshot, its possible that a broken HOT chain may get created
forwarding to the tuple that we index. For example:

A table with two columns a and b. Say, we already have an index on 'a'
and creating a new index on 'b'. Say, there is a tuple (0, 1): (1, 11) in
the table where (0, 1) in the line pointer of the tuple.
This is the visible tuple when we took snapshot in the first phase and so
we shall index this tuple with key 11 and TID (0, 1). But before the first
phase
finishes, the tuple is updated to (0, 2): (1, 22). This would be a HOT
update
resulting in a broken chain. The root line pointer for the new tuple is
still (0, 1).
CIC now waits for the updating transaction,
takes a fresh snapshot and starts the second phase. In the second phase,
it must index the new tuple (1, 22) using the key 22 and TID (0, 1) because
thats the root line pointer for the chain. Now there is already an index
entry
in the new index with key 11 and TID (0, 1). Inserting another index entry
with
a different key and the same TID would certainly corrupt the index.

We solve this issue by first creating a catalog entry for the new index and
committing the transaction. We then wait
out for any conflicting transactions before taking snapshot for building the
index in the first phase. This arrangement guarantees that the snapshot
used in the first phase would always satisfy tuples at the head of the
existing HOT chains. At the same time, no new broken HOT chains are
created while we are building the index because the new index
(even though not yet ready) is consulted for all subsequent UPDATEs.

Hope this helps.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Pavan Deolasee 2007-09-12 07:09:29 Re: HOT documentation README
Previous Message Tatsuo Ishii 2007-09-12 03:43:56 Re: invalidly encoded strings