Re: Patch: Global Unique Index

From: David Zhang <david(dot)zhang(at)highgo(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, Pgsql Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Patch: Global Unique Index
Date: 2022-12-02 23:29:25
Message-ID: 57cc769a-1969-7f2a-89e6-4aaefb9843fe@highgo.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot for all the comments.

On 2022-11-29 3:13 p.m., Tom Lane wrote:
> ... not to mention creating a high probability of deadlocks between
> concurrent insertions to different partitions. If they each
> ex-lock their own partition's index before starting to look into
> other partitions' indexes, it seems like a certainty that such
> cases would fail. The rule of thumb about locking multiple objects
> is that all comers had better do it in the same order, and this
> isn't doing that.
In the current POC patch, the deadlock is happening when backend-1
inserts a value to index X(partition-1), and backend-2 try to insert a
conflict value right after backend-1 released the buffer block lock but
before start to check unique on index Y(partition-2). In this case,
backend-1 holds ExclusiveLock on transaction-1 and waits for ShareLock
on transaction-2 , while backend-2 holds ExclusiveLock on transaction-2
and waits for ShareLock on transaction-1. Based on my debugging tests,
this only happens when backend-1 and backend-2 want to insert a conflict
value. If this is true, then is it ok to either `deadlock` error out or
`duplicated value` error out since this is a conflict value? (hopefully
end users can handle it in a similar way). I think the probability of
such deadlock has two conditions: 1) users insert a conflict value and
plus 2) the uniqueness checking happens in the right moment (see above).
> That specific issue could perhaps be fixed by having everybody
> examine all the indexes in the same order, inserting when you
> come to your own partition's index and otherwise just checking
> for conflicts. But that still means serializing insertions
> across all the partitions. And the fact that you need to lock
> all the partitions, or even just know what they all are,
Here is the main change for insertion cross-partition uniqueness check
in `0004-support-global-unique-index-insert-and-update.patch`,
     result = _bt_doinsert(rel, itup, checkUnique, indexUnchanged,
heapRel);

+    if (checkUnique != UNIQUE_CHECK_NO)
+        btinsert_check_unique_gi(itup, rel, heapRel, checkUnique);
+
     pfree(itup);

where, a cross-partition uniqueness check is added after the index tuple
btree insertion on current partition. The idea is to make sure other
backends can find out the ongoing index tuple just inserted (but before
marked as visible yet), and the current partition uniqueness check can
be skipped as it has already been checked. Based on this change, I think
the insertion serialization can happen in two cases: 1) two insertions
happen on the same buffer block (buffer lock waiting); 2) two ongoing
insertions with duplicated values (transaction id waiting);

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zheng Li 2022-12-02 23:48:59 Re: Support logical replication of DDLs
Previous Message Paul Jungwirth 2022-12-02 22:18:51 Think-o in foreign key comments