Re: Support for REINDEX CONCURRENTLY

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2013-09-26 10:34:00
Message-ID: 20130926103400.GA2471420@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
> > 2) I don't think the drop algorithm used now is correct. Your
> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
> > false at the same time. It does so after doing a WaitForVirtualLocks() -
> > but that's not sufficient. Between waiting and setting indisvalid =
> > false another transaction could start which then would start using that
> > index. Which will not get updated anymore by other concurrent backends
> > because of inislive = false.
> > You really need to follow index_drop's lead here and first unset
> > indisvalid then wait till nobody can use the index for querying anymore
> > and only then unset indislive.

> Sorry, I do not follow you here. index_concurrent_set_dead calls
> index_set_state_flags that sets indislive and *indisready* to false,
> not indisvalid. The concurrent index never uses indisvalid = true so
> it can never be called by another backend for a read query. The drop
> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as valid
2) now wait for all transactions that potentially still can use the index for
*querying* to finish. During that indisready *must* be true,
otherwise the index will have outdated contents.
3) Mark the index as indislive = false, indisready = false. Anything
using a newer relcache entry will now not update the index.
4) Wait till all potential updaters of the index have finished.
5) Drop the index.

With the patch's current scheme concurrent queries that use plans using
the old index will get wrong results (at least in read committed)
because concurrent writers will not update it anymore since it's marked
indisready = false.
This isn't a problem of the *new* index, it's a problem of the *old*
one.

Am I missing something?

> > 3) I am not sure if the swap algorithm used now actually is correct
> > either. We have mvcc snapshots now, right, but we're still potentially
> > taking separate snapshot for individual relcache lookups. What's
> > stopping us from temporarily ending up with two relcache entries with
> > the same relfilenode?
> > Previously you swapped names - I think that might end up being easier,
> > because having names temporarily confused isn't as bad as two indexes
> > manipulating the same file.

> Actually, performing swap operation with names proves to be more
> difficult than it looks as it makes necessary a moment where both the
> old and new indexes are marked as valid for all the backends.

But that doesn't make the current method correct, does it?

> The only
> reason for that is that index_set_state_flag assumes that a given xact
> has not yet done any transactional update when it is called, forcing
> to one the number of state flag that can be changed inside a
> transaction. This is a safe method IMO, and we shouldn't break that.

Part of that reasoning comes from the non-mvcc snapshot days, so it's
not really up to date anymore.
Even if you don't want to go through all that logic - which I'd
understand quite well - you can just do it like:
1) start with: old index: valid, ready, live; new index: invalid, ready, live
2) one transaction: switch names from real_name => tmp_name, new_name =>
real_name
3) one transaction: mark real_name (which is the rebuilt index) as valid,
and new_name as invalid

Now, if we fail in the midst of 3, we'd have two indexes marked as
valid. But that's unavoidable as long as you don't want to use
transactions.
Alternatively you could pass in a flag to use transactional updates,
that should now be safe.

At least, unless the old index still has "indexcheckxmin = true" with an
xmin that's not old enough. But in that case we cannot do the concurrent
reindex at all I think since we rely on the old index to be full valid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2013-09-26 11:40:40 Re: Support for REINDEX CONCURRENTLY
Previous Message Pavan Deolasee 2013-09-26 09:53:30 Re: pgbench filler columns