Re: Why does CREATE INDEX CONCURRENTLY need two scans?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Joshua Ma <josh(at)benchling(dot)com>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why does CREATE INDEX CONCURRENTLY need two scans?
Date: 2015-04-01 02:08:37
Message-ID: CAB7nPqSWkNm0UveY6xnr=cn4X9LS469NdHiG40P2XeH7VfHxOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh(at)benchling(dot)com> wrote:

> Hi all,
>
> I was curious about why CONCURRENTLY needs two scans to complete - from
> the documentation on HOT (access/heap/README.HOT), it looks like the
> process is:
>
> 1) insert pg_index entry, wait for relevant in-progress txns to finish
> (before marking index open for inserts, so HOT updates won't write
> incorrect index entries)
> 2) build index in 1st snapshot, mark index open for inserts
> 3) in 2nd snapshot, validate index and insert missing tuples since first
> snapshot, mark index valid for searches
>
> Why are two scans necessary? What would break if it did something like the
> following?
>
> 1) insert pg_index entry, wait for relevant txns to finish, mark index
> open for inserts
>
2) build index in a single snapshot, mark index valid for searches
>

> Wouldn't new inserts update the index correctly? Between the snapshot and
> index-updating txns afterwards, wouldn't all updates be covered?
>

When an index is built with index_build, are included in the index only the
tuples seen at the start of the first scan. A second scan is needed to add
in the index entries for the tuples that have been inserted into the table
during the build phase.
--
Michael

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message TonyS 2015-04-01 02:49:27 Would like to know how analyze works technically
Previous Message Stephen Frost 2015-04-01 01:06:26 Re: Fwd: SSPI authentication ASC_REQ_REPLAY_DETECT flag