Not your father's question about deadlocks

From: Clarence Gardner <clarence(at)silcom(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Not your father's question about deadlocks
Date: 2006-11-14 22:36:04
Message-ID: Pine.SUN.4.02.10611142235290.21053-100000@beach.silcom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Once upon a time, I put a question regarding deadlocks to the group,
and Tom Lane immediately answered with this:

>The guy waiting on the tuple-specific lock is second in
>line to actually mung the tuple. Whoever is first in line behind the
>current tenant will be blocked trying to acquire ShareLock on the
>current tenant's transaction ID.
>
>What you appear to have is a situation where two transactions are trying
>to lock or update the same two rows in different orders. Without a lot
>more info about your application logic, I couldn't guess why this seems
>to be associated with having more than two transaction interested in the
>same tuple.
>
>Note that the guy looking for ShareLock on the tuple is evidently either
>doing SELECT FOR SHARE on this tuple, or trying to install a new tuple
>referencing this one as a foreign key (which does SELECT FOR SHARE under
>the hood). But he's blocked by someone who's done either SELECT FOR
>UPDATE or an actual UPDATE on that tuple.

I'm still creeping up on the problem. I configured postgres to resolve
a deadlock only after a month, and now I seem to have a complete
snapshot of one of them. The details are below. The part that doesn't
fit with what Tom says is that there are only two transactions involved.
The first and third "guys" are the same guy!

The first one, I'll call it X71, is the one doing the insert. The
LedgerDetail table references various other tables; the one of interest
is the buys table, and it is inserting a record referencing buy #955. It
had done several other inserts before this, some of them also referencing
buy #955. (In fact, the immediately preceding statement was one such.)
This is evidenced by the RowShareLock that it holds on the buys table.

The other one, X78, is executing a stored procedure which has only one
reference to buys, this statement:
update buys set budget=budget+remainder where id=buynum;

So the timeline is this
X71 X78
----------------------- ----------------------
insert buy955 reference
insert buy955 reference
insert buy955 reference
update buy955 (blocks)
insert buy955 reference (blocks)

Here is the state of the system. I have edited it somewhat. The pg_locks
table actually shows five waiters; the three that I've deleted started
much later than when the deadlock happened, so I've deleted all the pg_locks
rows that belonged to their transactions. I've also deleted all pg_locks
rows for X71 and X78 that referred to relations that the other transaction
had no lock on. And I've deleted empty or constant columns, and the relation
oid column.
======================================================================
27478 | insert into LedgerDetail
(ledger,strategy,advId,campaignId,buyId,creativeId,siteId,providerType,adspotId,checkbookId,count,dr,cr,dr_usd,cr_usd,time)
values('imp',',61144,',533,676,955,4076,2,5,656,31275,14,0,110936,0,142142,1163530800)
| 2006-11-14 19:22:43.760703+00 | 2006-11-14 19:22:29.342683+00
99056 | FETCH 1 FROM "PgSQL_0CDD618C"
| 2006-11-14 19:22:43.748025+00 | 2006-11-13 19:21:40.038948+00

2006-11-14 11:22:43 PST[99056]LOG: statement: DECLARE "PgSQL_0CDD618C" CURSOR FOR
select repayBuyBudget(955, 31275, fromusd(ecc2usd(7790), 'EUR'))

relation | locktype | page | tuple | transactionid | transaction | pid | mode | granted
---------------------------+---------------+-------+-------+---------------+--------------+-------+------------------+---------
buys | relation | | | | 101953371 | 27478 | AccessShareLock | t
buys | relation | | | | 101953371 | 27478 | RowShareLock | t
buys | tuple | 38 | 11 | | 101953371 | 27478 | ShareLock | f
buys_pkey | relation | | | | 101953371 | 27478 | AccessShareLock | t

| transactionid | | | 101953371 | 101954678 | 99056 | ShareLock | f
buys | relation | | | | 101954678 | 99056 | AccessShareLock | t
buys | relation | | | | 101954678 | 99056 | RowExclusiveLock | t
buys | tuple | 38 | 11 | | 101954678 | 99056 | ExclusiveLock | t
buys_pkey | relation | | | | 101954678 | 99056 | AccessShareLock | t
buys_pkey | relation | | | | 101954678 | 99056 | RowExclusiveLock | t
======================================================================

That scenario seems quite simple, but I can't reproduce the deadlock with
this seemingly-identical sequence. The "Sn" prompts show the sequence of
statements done from psql in two terminal sessions.
S1 => create table t1(f1 int primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
CREATE TABLE
S2 => create table t2(f1 int references t1);
CREATE TABLE
S3 => insert into t1 values(1);
INSERT 0 1
S4 => begin;
BEGIN
S7 => update t1 set f1=1; (blocks)

S5 => begin;
BEGIN
S6 => insert into t2 values(1);
INSERT 0 1
S8 => insert into t2 values(1);
INSERT 0 1

S8 is the statement that seems analagous to the one that blocks in the
real system, but it doesn't here.

Apart from that test being different somehow that I can't figure out,
situation on the real system seems to suggest that I need to either
lock all the reference tables at the beginning of X71 (uncomfortable),
or have it commit after each insert (unacceptable).

Thanks in advance for your help.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul Silveira 2006-11-14 22:44:38 Using SAN Splits to instantly copy a DB
Previous Message Martijn van Oosterhout 2006-11-14 22:30:04 Re: The old Insert and retrieving your Serial problem in VB

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Frost 2006-11-14 22:40:03 Re: [HACKERS] Replication documentation addition
Previous Message Bruce Momjian 2006-11-14 22:32:21 Re: [HACKERS] Replication documentation addition