Re: Online index builds

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Online index builds
Date: 2006-07-13 05:07:49
Message-ID: 871wsqjdne.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:

> On Wed, 2006-07-12 at 12:09 -0400, Greg Stark wrote:
> > no regression tests yet.
>
> We'll need some performance tests that show that lock-hold time is
> *actually* reduced, given the shenanigans needed to get there.

I'm not sure what you mean by "lock-hold time". Online index builds
effectively take *no* locks in the user-visible sense that regular index
builds do. Other transactions can insert, update, delete continuously
throughout the entire process.

The only locks that are taken are

1) a ShareUpdateExclusiveLock which blocks vacuum from running on the table
being indexed. This is taken by both phase 1 and phase 2. (Actually I had
the wrong lock in the patch I emailed in one place. Fixed in my source tree
here)

2) An ExclusiveLock that is taken momentarily and immediately released. Even
if that can never be acquired due to a busy system it can eventually
proceed anyways as long as there are no long-running transactions that are
refusing to commit.

That said we do need some performance tests to get an idea how long phase 2
takes for large tables. The additional index and heap scan and tid sort could
take a substantial amount of time though never as long as the original index
build done in phase 1.

What's worse is that in some cases the merge could potentially be doing a lot
of retail index inserts. I have no good intuition for how long those will take
relative to the wholesale index build method, especially since for some index
methods like GIN retail inserts are extremely expensive.

So for indexes that don't have a lot of records that need to be inserted
individually what I expect -- and what I put in the docs -- is something under
100% time penalty for an online index build. In fact I expect it to be more
like 50% though it depends on how wide the original index. For ones that do
have lots of records mutated for phase 2 all bets are off.

> We may need to have usage recommendations in the docs.

I'm writing docs now. I'm trying to find a happy medium between explaining all
the issues and spamming the docs with lots of discussion. Right now what I
have is a single paragraph in the create_index man page that refers to the
Postgres manual where I list the issues in more depth.

I also still have to get some kind of regression tests. I don't think we have
any concurrent regression tests currently, do we? To thoroughly test it will
be quite hard. Some of the corner cases are extremely narrow or require very
particular types of transactions running with very specific timing.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2006-07-13 05:22:40 Re: Three weeks left until feature freeze
Previous Message Thomas Hallgren 2006-07-13 04:58:09 Re: Three weeks left until feature freeze