Quick Links

Why does CREATE INDEX CONCURRENTLY need two scans?

From:	Joshua Ma <josh(at)benchling(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Why does CREATE INDEX CONCURRENTLY need two scans?
Date:	2015-04-01 00:43:39
Message-ID:	CAG9XPVn_oYgssW5W7K5zAq6fa5RiTOGCLMnE7rovW-3fe9a3fw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi all,

I was curious about why CONCURRENTLY needs two scans to complete - from the
documentation on HOT (access/heap/README.HOT), it looks like the process is:

1) insert pg_index entry, wait for relevant in-progress txns to finish
(before marking index open for inserts, so HOT updates won't write
incorrect index entries)
2) build index in 1st snapshot, mark index open for inserts
3) in 2nd snapshot, validate index and insert missing tuples since first
snapshot, mark index valid for searches

Why are two scans necessary? What would break if it did something like the
following?

1) insert pg_index entry, wait for relevant txns to finish, mark index open
for inserts
2) build index in a single snapshot, mark index valid for searches

Wouldn't new inserts update the index correctly? Between the snapshot and
index-updating txns afterwards, wouldn't all updates be covered?

To be clear, I'm not trying to suggest any changes, just wondering what's
missing from my mental model. :)

Thanks!
Josh
ᐧ

Responses

Re: Why does CREATE INDEX CONCURRENTLY need two scans? at 2015-04-01 02:08:37 from Michael Paquier

Browse pgsql-general by date

	From	Date	Subject
Next Message	Stephen Frost	2015-04-01 01:06:26	Re: Fwd: SSPI authentication ASC_REQ_REPLAY_DETECT flag
Previous Message	Andy Colson	2015-03-31 22:07:10	Re: Partial index-based load balancing