Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date: 2019-04-30 15:51:10
Message-ID: 30579.1556639470@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-04-30 00:50:20 -0400, Tom Lane wrote:
>> Hm? REINDEX INDEX is deadlock-prone by definition, because it starts
>> by opening/locking the index and then it has to open/lock the index's
>> table. Every other operation locks tables before their indexes.

> We claim to have solved that:

Oh, okay ...

> I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX
> over catalog tables modified during reindex.

So far, every one of the failures in the buildfarm looks like the REINDEX
is deciding that it needs to wait for some other transaction, eg here

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-30%2014%3A43%3A11

the relevant bit of postmaster log is

2019-04-30 14:44:13.478 UTC [16135:450] pg_regress/create_index LOG: statement: REINDEX TABLE pg_class;
2019-04-30 14:44:14.478 UTC [16137:430] pg_regress/create_view LOG: process 16137 detected deadlock while waiting for AccessShareLock on relation 2662 of database 16384 after 1000.148 ms
2019-04-30 14:44:14.478 UTC [16137:431] pg_regress/create_view DETAIL: Process holding the lock: 16135. Wait queue: .
2019-04-30 14:44:14.478 UTC [16137:432] pg_regress/create_view STATEMENT: DROP SCHEMA temp_view_test CASCADE;
2019-04-30 14:44:14.478 UTC [16137:433] pg_regress/create_view ERROR: deadlock detected
2019-04-30 14:44:14.478 UTC [16137:434] pg_regress/create_view DETAIL: Process 16137 waits for AccessShareLock on relation 2662 of database 16384; blocked by process 16135.
Process 16135 waits for ShareLock on transaction 2875; blocked by process 16137.
Process 16137: DROP SCHEMA temp_view_test CASCADE;
Process 16135: REINDEX TABLE pg_class;
2019-04-30 14:44:14.478 UTC [16137:435] pg_regress/create_view HINT: See server log for query details.
2019-04-30 14:44:14.478 UTC [16137:436] pg_regress/create_view STATEMENT: DROP SCHEMA temp_view_test CASCADE;

I haven't been able to reproduce this locally yet, but my guess is that
the REINDEX wants to update some row that was already updated by the
concurrent transaction, so it has to wait to see if the latter commits
or not. And, of course, waiting while holding AccessExclusiveLock on
any index of pg_class is a Bad Idea (TM). But I can't quite see why
we'd be doing something like that during the reindex ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2019-04-30 16:32:35 Re: Failure in contrib test _int on loach
Previous Message Andres Freund 2019-04-30 15:17:35 Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?