tablecmds: Deadlock caused by Attach Partition

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: tablecmds: Deadlock caused by Attach Partition
Date: 2026-01-30 03:15:51
Message-ID: CFACA0EB-7E6F-4FAA-9ACE-1FC2226D7482@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hacker,

This issue was initially discovered while working on patch [1], which makes REPLICA IDENTITY recurse from a partitioned table down to all partitions. The deadlock test itself was originally designed by Dmitry Dolgov (see [2]). After further investigation, I confirmed that this deadlock is actually a more general problem and can be reliably reproduced on the current master branch.

How to reproduce?
==============

1. Edit the attached script with changing the db name to your own, then run the attached sh script in a terminal. Basically the script creates a deep partition chain: test1<-test2<-test3<-… <-test1000.
2. In the other thermal, start a psql session, and repeatedly run “ALTER TABLE test1 ALTER COLUMN id SET DEFAULT 0;”, then you should quickly see a deadlock error reported in the server log.

My analysis
=========

Let’s first look at the server log:
```
2026-01-29 16:36:14.308 CST [50910] DETAIL: Process 50910 waits for AccessExclusiveLock on relation 38382 of database 16384; blocked by process 69148.
Process 69148 waits for AccessShareLock on relation 38376 of database 16384; blocked by process 50910.
Process 50910: ALTER TABLE test1 ALTER COLUMN id SET DEFAULT 0;
Process 69148: alter table test334 attach partition test335 for values from (335) to (1000000000);
```

Here relation 38376 was table test29, and relation 38382 was table test30. In other words:

* ALTER TABLE (let’s call it session A) was waiting for a lock on table30,
* ATTACH PARTITION (let’s call it session B) was waiting for a lock on test29.

At the time:

* session A held lock on test29 and requesting lock on test30;
* session B held lock on test30 and requesting lock on test29;

This circular wait leads directly to the deadlock. I then analyzed the lock behavior of both sessions.

For Session A: ALTER COLUMN SET DEFAULT:

* ALTER TABLE aways locks the target table, so test1 locked first
* Since SET DEFAULT recurses to partitions, ATSimpleRecursion() is used to propagate the command. During this process, find_all_inheritors() is called to locate all partitions, sort them by OID, and then lock them one by one.
* As a result, locks are acquired in the order test2, test3, … up to the last existing partition.

This explains why session A held a lock on test29 and was requesting a lock on test30.

From Session B: ATTACH PARTITION:

Session B was requesting an AccessShareLock. Based on this, I traced the lock request to generate_partition_qual(), which is a recursive function. It walks up the partition tree to collect all partition quals, meaning that ATTACH PARTITION locks all ancestors in reverse OID order.

That explains why session B held a lock on test30 and was requesting a lock on test29.

Proposed solution
==============

The root cause is inconsistent lock ordering.

A simple solution is to pre-lock all ancestors for ATTACH PARTITION in OID order. That way, when generate_partition_qual() later recurses up the partition tree, all required locks have already been acquired.

One important detail is that these ancestor locks must be taken before the ALTER TABLE infrastructure locks the target table; otherwise, the lock order still cannot be guaranteed. With this in mind, I found that RangeVarCallbackForAlterRelation() is the appropriate place to perform this pre-locking.

Please see the attached v1 patch for the implementation.

Potential issue remaining
===================

Currently, find_all_inheritors() sorts all inheritors purely by OID. In most cases this works, because users typically create a partitioned table first and then add partitions later, so the partitioned table’s OID is smaller than those of its partitions.

However, this is not always true. For example, if a user creates a regular table A first, then creates a partitioned table P, and later attaches A to P, P’s OID will be larger than A’s. If such a case is reported in the future, I think we should enhance find_all_inheritors() to always lock the parent first, and only sort siblings at the same level by OID.

This patch has not addressed that potential issue.

[1] https://postgr.es/m/CAEoWx2nJ71hy8R614HQr7vQhkBReO9AANPODPg0aSQs74eOdLQ@mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment Content-Type Size
nocfbot_nested_partitions.sh application/octet-stream 344 bytes
v1-0001-Prevent-deadlock-between-ALTER-TABLE-recursion-an.patch application/octet-stream 4.4 KB

Browse pgsql-hackers by date

  From Date Subject
Previous Message Fujii Masao 2026-01-30 03:12:38 Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery