Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

From: "Goel, Dhruv" <goeldhru(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
Date: 2019-05-15 08:15:04
Message-ID: 6BF27921-C662-4C4A-A9DC-4E124C180CAF@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Currently any DDL operations (Create Indexes, Drop Indexes etc.) when run during an existing concurrent index build on the same table causes the index build to fail with “deadlock detected”. This is a pain-point specially when we want to kick-off multiple concurrent index builds on the same table; the index build will reach phase 3 (consuming resources) and then fail with deadlock errors.

I have a patch that might improve the build times and reduce deadlock occurrences. Is this something the community would be interested in? I might be missing some documentation changes in the patch but wanted to get some feedback on the functional aspect of the patch first.

Problem:
In the Concurrent Index creation implementation there are three waits that are relevant:

1. Wait 1 at start of Phase 2: Postgres waits for all transactions that started before this transaction and conflict with “Share Lock” on this relation. This is to make sure from this point forward all HOT updates to the table will be compatible with the new index.
2. Wait 2 at the start of Phase 3: Postgres waits for all transactions that started before this transaction and conflict with “Share Lock” on this relation.
3. Wait 3 at the end of Phase 3: PG waits for all transactions that started before this transaction primarily because they should not start using the index as they might be using an older snapshot and the index does not have all the entries (missing deleted tuples) for snapshot.

Typically, all the three wait states can cause deadlocks. Deadlocks due to the third wait state is reproduced by transactions that are waiting for a lock to be freed from “CREATE INDEX CONCURRENTLY” will cause deadlocks (primarily DDLs). The former 2 waits are much harder to reproduce with the test case being a Insert/Update/Delete as first statement of the transaction and then another DDL which causes lock escalation.

Proposed Solution:
We remove the third wait state completely from the concurrent index build. When we mark the index as ready, we also mark “indcheckxmin” to true which essentially enforces Postgres to not use this index for older snapshots.

Tests:
Added an isolation test which breaks without the patch. Manual test with a Repeatable Read Transaction that has an older snapshot with a tuple that has been deleted since and not part of the index.

May the force be with you,
Dhruv

Attachment Content-Type Size
patch-cic.patch application/octet-stream 6.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-05-15 08:36:32 Re: PostgreSQL 12: Feature Highlights
Previous Message Masahiko Sawada 2019-05-15 06:44:22 Re: vacuumdb and new VACUUM options