Re: BUG #8434: Why does dead lock occur many times ?

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8434: Why does dead lock occur many times ?
Date: 2013-10-04 17:22:17
Message-ID: 20131004172217.GW5235@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I spent a lot of time looking at this issue. Your problem scenario can
be reduced to the following test case:

CREATE TABLE t (a INT);
INSERT INTO t VALUES (1);

The three sessions need to execute the following commands, in this
sequence:

-- Session 1
/* 1 */ BEGIN;
SELECT * FROM t FOR UPDATE;

-- Session 2
/* 2 */ BEGIN;
UPDATE t SET a = 2; -- blocks waiting on S1

-- Session 3
/* 3 */ BEGIN;
UPDATE t SET a = 3; -- blocks
-- this is waiting on S1, but since S2 is also waiting on S1 and will acquire a
-- conflicting lock first, this session will only be released after S2 releases
-- its own lock.

Session 1
/* 4 */ UPDATE t SET a = 4;
COMMIT; -- releases session 2
BEGIN;
SELECT * FROM t FOR UPDATE; -- blocks waiting on S2 (*)

Session 2:
/* 5 */ COMMIT; -- releases session 1
BEGIN;
UPDATE t SET a = 5; -- blocks waiting on S3

Session 1:
/* 6 */ UPDATE t SET a = 6; -- blocks waiting on S3

At this point, all sessions are blocked on their UPDATE commands, and
deadlock is reported. (For debugging, it's useful to set a very large
deadlock_timeout and examine pg_locks, etc.)

The point at which things started to go wrong was where the session 1's
SELECT FOR UPDATE with the (*) was allowed to continue after session 2
commits. In 9.2 that session remains blocked, and instead we release
session 3.

The problem is the handling of following of the update chain during a
lock command; the question is should sessions be blocked waiting for
locks on future versions of the row? They currently don't block, which
is what leads the SELECT FOR UPDATE to continue; but if we make them
block, other things break. I initially toyed with the following patch,
which removes the deadlock the above report and also your original test
case:

--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4832,15 +4832,12 @@ l4:
xmax = HeapTupleHeaderGetRawXmax(mytup.t_data);

/*
- * If this tuple is updated and the key has been modified (or
- * deleted), what we do depends on the status of the updating
- * transaction: if it's live, we sleep until it finishes; if it has
- * committed, we have to fail (i.e. return HeapTupleUpdated); if it
- * aborted, we ignore it. For updates that didn't touch the key, we
- * can just plough ahead.
+ * If this tuple is updated, what we do depends on the status of the
+ * updating transaction: if it's live, we sleep until it finishes; if
+ * it has committed, we have to fail (i.e. return HeapTupleUpdated); if
+ * it aborted, we ignore it.
*/
- if (!(old_infomask & HEAP_XMAX_INVALID) &&
- (mytup.t_data->t_infomask2 & HEAP_KEYS_UPDATED))
+ if (!(old_infomask & HEAP_XMAX_INVALID))
{
TransactionId update_xid;

However, the differences in behavior this causes (visible by running the
isolation tests) don't look good to me. It's quite possible that there
are other bugs elsewhere.

Sadly, I have to look at other things now and I won't have time to
research this further until after next week's minor releases; so
whatever we decide to do, if anything, will not be in 9.3.1.

Thanks for the report.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2013-10-04 20:20:46 Re: BUG #8469: Xpath behaviour unintuitive / arguably wrong
Previous Message kyrimis 2013-10-04 13:26:11 BUG #8500: Upgrade to postgis 2.1 breaks existing databases