Re: Support for REINDEX CONCURRENTLY

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2013-09-26 03:13:30
Message-ID: CAB7nPqStv+6CrRXWAn70i7q6pYSTwFRXPRUZ8gEpv0gXGdDScQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for late reply, I am coming back poking at this patch a bit. One
of the things that I am still unhappy with this patch are the
potential deadlocks that can come up when for example another backend
kicks another operation taking ShareUpdateExclusiveLock (ANALYZE or
another REINDEX CONCURRENTLY) on the same relation as the one
reindexed concurrently. This can happen because we need to wait at
index validation phase as process might not have taken into account
deleted tuples before the reference snapshot was taken. I played a
little bit with a version of the code using no old snapshot waiting,
but even if I couldn't break it directly, concurrent backends
sometimes took incorrect tuples from heap. I unfortunately have no
clear solution about how to solve that... Except making REINDEX
CONCURRENTLY fail when validating the concurrent index with a clear
error message not referencing to any deadlock, giving priority to
other processes like for example ANALYZE, or other backends ready to
kick another REINDEX CONCURRENTLY... Any ideas here are welcome, the
patch attached does the implementation mentioned here.

On Tue, Sep 17, 2013 at 12:37 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Looking at this version of the patch now:
> 1) comment for "Phase 4 of REINDEX CONCURRENTLY" ends with an incomplete
> sentence.
Oops, thanks.

> 2) I don't think the drop algorithm used now is correct. Your
> index_concurrent_set_dead() sets both indisvalid = false and indislive =
> false at the same time. It does so after doing a WaitForVirtualLocks() -
> but that's not sufficient. Between waiting and setting indisvalid =
> false another transaction could start which then would start using that
> index. Which will not get updated anymore by other concurrent backends
> because of inislive = false.
> You really need to follow index_drop's lead here and first unset
> indisvalid then wait till nobody can use the index for querying anymore
> and only then unset indislive.
Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

> 3) I am not sure if the swap algorithm used now actually is correct
> either. We have mvcc snapshots now, right, but we're still potentially
> taking separate snapshot for individual relcache lookups. What's
> stopping us from temporarily ending up with two relcache entries with
> the same relfilenode?
> Previously you swapped names - I think that might end up being easier,
> because having names temporarily confused isn't as bad as two indexes
> manipulating the same file.
Actually, performing swap operation with names proves to be more
difficult than it looks as it makes necessary a moment where both the
old and new indexes are marked as valid for all the backends. The only
reason for that is that index_set_state_flag assumes that a given xact
has not yet done any transactional update when it is called, forcing
to one the number of state flag that can be changed inside a
transaction. This is a safe method IMO, and we shouldn't break that.
Also, as far as I understood, this is something that we *want* to
avoid to a REINDEX CONCURRENTLY process that might fail and come up
with a double number of valid indexes for a given relation if it is
performed for a table (or an index if reindex is done on an index).
This is also a requirement for toast indexes where new code assumes
that a toast relation can only have one single valid index at the same
time. For those reasons the relfilenode approach is better.

Regards,
--
Michael

Attachment Content-Type Size
20130926_0_procarray.patch application/octet-stream 13.6 KB
20130926_1_index_struct.patch application/octet-stream 7.6 KB
20130926_2_reindex_conc_v31.patch application/octet-stream 66.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-09-26 03:14:13 Re: pgbench progress report improvements - split 2
Previous Message Erik Rijkers 2013-09-25 22:34:43 Re: Minmax indexes