From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Date: 2013-11-18 12:19:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-11-18 19:52:29 +0900, Michael Paquier wrote:
> On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
> >> - 20131114_3_reindex_concurrently.patch, providing the core feature.
> >> Patch 3 needs to have patch 2 applied first. Regression tests,
> >> isolation tests and documentation are included with the patch.
> >
> > Have you addressed my concurrency concerns from the last version?
> I have added documentation in the patch with a better explanation
> about why those choices of implementation are made.

The dropping still isn't safe:
After phase 4 we are in the state:
old index: valid, live, !isdead
new index: !valid, live, !isdead
Then you do a index_concurrent_set_dead() from that state on in phase 5.
There you do WaitForLockers(locktag, AccessExclusiveLock) before
That's not sufficient.

Consider what happens with the following sequence:
1) WaitForLockers(locktag, AccessExclusiveLock)
-> GetLockConflicts() => virtualxact 1
-> VirtualXactLock(1)
2) virtualxact 2 starts, opens the *old* index since it's currently the
only valid one.
3) virtualxact 1 finishes
4) index_concurrent_set_dead() does index_set_state_flags(DROP_SET_DEAD)
5) another transaction (vxid 3) starts inserting data into the relation, updates
only the new index, the old index is dead
6) vxid 2 inserts data, updates only the old index. Since it had the
index already open the cache invalidations won't be processed.

Now the indexes are out of sync. There's entries only in the old index
and there's entries only in the new index. Not good.

I hate to repeat myself, but you really need to follow the current
protocol for concurrently dropping indexes. Which involves *first*
marking the index as invalid so it won't be used for querying anymore,
then wait for everyone possibly still seing that entry to finish, and
only *after* that mark the index as dead. You cannot argue away
correctness concerns with potential deadlocks.


I am also still unconvinced that the logic in index_concurrent_swap() is
correct. It very much needs to explain why no backend can see values
that are inconsistent. E.g. what prevents a backend thinking the old and
new indexes have the same relfilenode? MVCC snapshots don't seem to
protect you against that.
I am not sure there's a problem, but there certainly needs to more
comments explaining why there are none.

Something like the following might be possible:

Backend 1: start reindex concurrently, till phase 4
Backend 2: ExecOpenIndices()
-> RelationGetIndexList (that list is consistent due to mvcc snapshots!)
Backend 2: -> index_open(old_index) (old relfilenode)
Backend 1: index_concurrent_swap()
-> CommitTransaction()
-> ProcArrayEndTransaction() (changes visible to others henceforth!)
Backend 2: -> index_open(new_index)
=> no cache invalidations yet, gets the old relfilenode
Backend 2: ExecInsertIndexTuples()
=> updates the same relation twice, corruptf
Backend 1: stil. in CommitTransaction()
-> AtEOXact_Inval() sends out invalidations


Andres Freund

Andres Freund
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2013-11-18 12:25:40 Re: Custom Scan APIs (Re: Custom Plan node)
Previous Message Heikki Linnakangas 2013-11-18 12:06:25 Re: Sequence Access Method WIP