Concurrent CREATE INDEX, try 2 (was Re: Reducing relation locking overhead)

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Concurrent CREATE INDEX, try 2 (was Re: Reducing relation locking overhead)
Date: 2005-12-05 14:03:30
Message-ID: 1133791410.3628.34.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, R, 2005-12-02 kell 02:14, kirjutas Tom Lane:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > It was a *major* new feature that many people were waiting for when Oracle
> > finally implemented live CREATE INDEX and REINDEX. The ability to run create
> > an index without blocking any operations on a table, even updates, was
> > absolutely critical for 24x7 operation.
>
> Well, we're still not in *that* ballpark and I haven't seen any serious
> proposals to make us so. How "absolutely critical" is it really?
> Is REINDEX-in-parallel-with-reads-but-not-writes, which is what we
> actually have at the moment, an "absolutely critical" facility?

I don't think REINDEX to be very critical (exept if for some reason you
have failed to vacuum a high-traffic table for some time and need to get
index sizes down).

OTOH, being able to add indexes on live database is sometimes required,
and neither of the current ways ( accept a few hours of downtime or use
slony relica and do a swithcover) are always acceptable. This capability
is reportedly present in MSSQL and available for Oracle if you get the
more expensive Enetrprise Edition.

So, after more thinking, I have come up with a proposal for fully
concurrent (read+write) create index, which should need minimal amount
of locking.

Concurrent CREATE INDEX
========================

Concurrent index NDX1 on table TAB1 is created like this:

1) start transaction. take a snapshot SNAP1

1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer
inserts/updates to happen at end of table (won't work for in-page
updates without code changes)

2) create the index as we do now, but only for pages which are visible
in SNAP1

3) record the index in pg_class, but mark it as "do not use for lookups"
in a new field. Take snapshot SNAP2. commit transaction.

-- at this point all new inserts and updates will be recorded in NDX1

4) Run a full scan over TAB1 and add all rows that are visible in SNAP2
but not in SNAP1 to NDX1. (if there is some way (like p1.1) to restrict
or record the area in heap that new tuples go to, then this can be done
more efficiently than full scan)

5) record the status of index as "ready for use".

-- now the index is fully created and usable

This is in no way a final proposal, but rather starting point for
discussion of how things might be doable. For example p.3 is probably
tricky to do in a way that all backends pick up at the right time. This
will need most places that do table updates to be reviewed to make sure
that they check for new indexes.

Any comments are appreciated.

-------------
Hannu Krosing

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2005-12-05 14:19:31 Re: [PATCHES] snprintf() argument reordering not working
Previous Message mark 2005-12-05 13:38:17 Re: Reducing relation locking overhead