Re: Support for REINDEX CONCURRENTLY

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2013-01-23 18:41:34
Message-ID: 20130123184134.GE7048@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
> OK. I am back to this patch after a too long time.

Dito ;)

> > > > * would be nice (but thats probably a step #2 thing) to do the
> > > > individual steps of concurrent reindex over multiple relations to
> > > > avoid too much overall waiting for other transactions.
> > > >
> > > I think I did that by now using one transaction per index for each
> > > operation except the drop phase...
> >
> > Without yet having read the new version, I think thats not what I
> > meant. There currently is a wait for concurrent transactions to end
> > after most of the phases for every relation, right? If you have a busy
> > database with somewhat longrunning transactions thats going to slow
> > everything down with waiting quite bit. I wondered whether it would make
> > sense to do PHASE1 for all indexes in all relations, then wait once,
> > then PHASE2...
>
> That obviously has some space and index maintainece overhead issues, but
> > its probably sensible anyway in many cases.
> >
> OK, phase 1 is done with only one transaction for all the indexes. Do you
> mean that we should do that with a single transaction for each index?

Yes.

> > Isn't the following block content thats mostly available somewhere else
> > already?
> > [... doc extract ...]
> >
> Yes, this portion of the docs is pretty similar to what is findable in
> CREATE INDEX CONCURRENTLY. Why not creating a new common documentation
> section that CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY could refer
> to? I think we should first work on the code and then do the docs properly
> though.

Agreed. I just noticed it when scrolling through the patch.

> > > - if (concurrent && is_exclusion)
> > > + if (concurrent && is_exclusion && !is_reindex)
> > > ereport(ERROR,
> > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > errmsg_internal("concurrent index
> > creation for exclusion constraints is not supported")));
> >
> > This is what I referred to above wrt reindex and CONCURRENTLY. We
> > shouldn't pass concurrently if we don't deem it to be safe for exlusion
> > constraints.
> >
> So does that mean that it is not possible to create an exclusive constraint
> in a concurrent context?

Yes, its currently not safe in the general case.

> Code path used by REINDEX concurrently permits to
> create an index in parallel of an existing one and not a completely new
> index. Shouldn't this work for indexes used by exclusion indexes also?

But that fact might safe things. I don't immediately see any reason that
adding a
if (!indisvalid)
return;
to check_exclusion_constraint wouldn't be sufficient if there's another
index with an equivalent definition.

> > > + /*
> > > + * Phase 2 of REINDEX CONCURRENTLY
> > > + *
> > > + * Build concurrent indexes in a separate transaction for each
> > index to
> > > + * avoid having open transactions for an unnecessary long time.
> > We also
> > > + * need to wait until no running transactions could have the
> > parent table
> > > + * of index open. A concurrent build is done for each concurrent
> > > + * index that will replace the old indexes.
> > > + */
> > > +
> > > + /* Get the first element of concurrent index list */
> > > + lc2 = list_head(concurrentIndexIds);
> > > +
> > > + foreach(lc, indexIds)
> > > + {
> > > + Relation indexRel;
> > > + Oid indOid = lfirst_oid(lc);
> > > + Oid concurrentOid = lfirst_oid(lc2);
> > > + Oid relOid;
> > > + bool primary;
> > > + LOCKTAG *heapLockTag = NULL;
> > > + ListCell *cell;
> > > +
> > > + /* Move to next concurrent item */
> > > + lc2 = lnext(lc2);
> > > +
> > > + /* Start new transaction for this index concurrent build */
> > > + StartTransactionCommand();
> > > +
> > > + /* Get the parent relation Oid */
> > > + relOid = IndexGetRelation(indOid, false);
> > > +
> > > + /*
> > > + * Find the locktag of parent table for this index, we
> > need to wait for
> > > + * locks on it.
> > > + */
> > > + foreach(cell, lockTags)
> > > + {
> > > + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
> > > + if (relOid == localTag->locktag_field2)
> > > + heapLockTag = localTag;
> > > + }
> > > +
> > > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> > InvalidOid);
> > > + WaitForVirtualLocks(*heapLockTag, ShareLock);
> >
> > Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
> > once for all relations after each phase? Otherwise the waiting time will
> > really start to hit when you do this on a somewhat busy server.
> >
> Each new index is built and set as ready in a separate single transaction,
> so doesn't it make sense to wait for the parent relation each time. It is
> possible to wait for a parent relation only once during this phase but in
> this case all the indexes of the same relation need to be set as ready in
> the same transaction. So here the choice is either to wait for the same
> relation multiple times for a single index or wait once for a parent
> relation but we build all the concurrent indexes within the same
> transaction. Choice 1 makes the code clearer and more robust to my mind as
> the phase 2 is done clearly for each index separately. Thoughts?

As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thus waiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:

WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)
...

So instead of waiting up 10 minutes for that phase you have to wait up
to 40.

> > > + indexRel = index_open(indOid, ShareUpdateExclusiveLock);
> >
> > I wonder we should directly open it exlusive here given its going to
> > opened exclusively in a bit anyway. Not that that will really reduce the
> > deadlock likelihood since we already hold the ShareUpdateExclusiveLock
> > in session mode ...
> >
> I tried to use an AccessExclusiveLock here but it happens that this is not
> compatible with index_set_state_flags. Does taking an exclusive lock
> increments the transaction ID of running transaction? Because what I am
> seeing is that taking AccessExclusiveLock on this index does a transaction
> update.

Yep, it does when wal_level = hot_standby because it logs the exclusive
lock to wal so the startup process on the standby can acquire it.

Imo that Assert needs to be moved to the existing callsites if there
isn't an equivalent one already.

> For those reasons current code sticks with ShareUpdateExclusiveLock. Not a
> big deal btw...

Well, lock upgrades make deadlocks more likely.

Ok, of to v7:
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
...
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
..
+ /*
+ * If the index swapped is a toast index, take an exclusive lock
on its
+ * parent toast relation and then update reltoastidxid to the
new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }

ISTM the RowExclusiveLock on the toast table should be acquired before
the locks on the indexes.

+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query. To do this, inquire which xacts currently would
+ * conflict with AccessExclusiveLock on the table -- ie, which ones
+ * have a lock of any kind on the table. Then wait for each of these
+ * xacts to commit or abort. Note we do not need to worry about xacts
+ * that open the table for reading after this point; they will see the
+ * index as invalid when they open the relation.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need
+ * not check for that. Also, prepared xacts are not reported, which
+ * is fine since they certainly aren't going to do anything more.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);

Most of that comment seems to belong to WaitForVirtualLocks instead of
this specific caller of WaitForVirtualLocks.

A comment in the header that it is doing the waiting would also be good.

In ReindexRelationsConcurrently I suggest s/bypassing/skipping/.

Btw, seing that we have an indisvalid check the toast table's index, do
we have any way to cleanup such a dead index? I don't think its allowed
to drop the index of a toast table. I.e. we possibly need to relax that
check for invalid indexes :/.

I think the usage of list_append_unique_oids in
ReindexRelationsConcurrently might get too expensive in larger
schemas. Its O(n^2) in the current usage and schemas with lots of
relations/indexes aren't unlikely candidates for this feature.
The easist solution probably is to use a hashtable.

ReindexRelationsConcurrently should do a CHECK_FOR_INTERRUPTS() every
once in a while, its currently not gracefully interruptible which
probably is bad in a bigger schema.

Thats all I have for now.

This patch is starting to look seriously cool and it seems realistic to
get into a ready state for 9.3.

I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-01-23 18:44:03 Re: CF3+4 (was Re: Parallel query execution)
Previous Message Alvaro Herrera 2013-01-23 18:33:12 Re: foreign key locks