Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date: 2019-04-30 07:05:52
Message-ID: 20190430070552.jzqgcy4ihalx7nur@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-04-30 00:50:20 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On April 29, 2019 9:37:33 PM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Seems like putting reindexes of pg_class into a test script that runs
> >> in parallel with other DDL wasn't a hot idea.
>
> > Saw that. Will try to reproduce (and if necessary either run separately or revert). But isn't that somewhat broken? They're not run in a transaction, so the locking shouldn't be deadlock prone.
>
> Hm? REINDEX INDEX is deadlock-prone by definition, because it starts
> by opening/locking the index and then it has to open/lock the index's
> table. Every other operation locks tables before their indexes.

We claim to have solved that:

/*
* ReindexIndex
* Recreate a specific index.
*/
void
ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)

/*
* Find and lock index, and check permissions on table; use callback to
* obtain lock on table first, to avoid deadlock hazard. The lock level
* used here must match the index lock obtained in reindex_index().
*/
indOid = RangeVarGetRelidExtended(indexRelation,
concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
0,
RangeVarCallbackForReindexIndex,
(void *) &heapOid);

and I don't see an obvious hole in the general implementation. Minus the
comment that code exists back to 9.4.

I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX
over catalog tables modified during reindex. The callback acquires a
ShareLock lock on the index's table, but *also* during the reindex needs
a RowExclusiveLock on pg_class, etc. E.g. in RelationSetNewRelfilenode()
on pg_class, and on pg_index in index_build(). Which means there's a
lock-upgrade hazard (Share to RowExclusive - well, that's more a
side-grade, but still deadlock prone).

I can think of ways to fix that (e.g. if reindex is on pg_class or
index, use SHARE ROW EXCLUSIVE, rather than SHARE), but we'd probably
not want to backpatch that.

I'll try to reproduce tomorrow.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-04-30 07:13:37 Re: performance regression when filling in a table
Previous Message Paul Guo 2019-04-30 06:33:47 Re: standby recovery fails (tablespace related) (tentative patch and discussion)