Deadlock bug

From: Joel Jacobson <joel(at)gluefinance(dot)com>
To: glue(at)pgexperts(dot)com, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Deadlock bug
Date: 2010-08-20 13:11:11
Message-ID: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

(Magnus and pghackers, I've included you in this email, since it appears to
be PostgreSQL bug. The example below is general, and not specific to Glue
Finance database model. Feel free to share it with anyone.)

I've just tried to replicate the deadlock in 8.4.4 and 9.0b4.
Same problem in both versions.
8.4.4 example: http://screencast.com/t/ZTBlMTBmNTc

---- start of comments, specific to Glue Finance database ----

(1) Orders.SessionID is not really necessary, we only store it to log what
session created which order. We never use this information, it is merely
saved for logging purposes.
Dropping the foreign key...
"orders_sessionid_fkey" FOREIGN KEY (sessionid) REFERENCES
sessions(sessionid)
...would mean we risk data integrity problems if the session would be
deleted (which it never is), even if it would be deleted, we wouldn't really
care since it just for logging purposes.

(2) Setting Orders.Heartbeat to now() on each
intended-to-be-most-of-the-times-read-only-until-something-happends-request
(aka Get_Server_Request) is of course a huge performance hit, as it require
a row exclusive lock, meaning such requests cannot be performed in
parallell.
We will therefore remove the Orders.Heartbeat column entirely.

(3) Making sure Orders is always locked first, before obtaining the Sessions
lock, would like you suggest also solve the problem, but requires a larger
rewrite of probably a lot of functions.
Removing the foreign key means we don't have to rewrite the functions.

(4) Fix the PostgreSQL bug.

(1) would effectively solve the deadlock issue, but not the performance
issue, we should therefore do (2) as well.
---- end of comments, specific to Glue Finance database ----

I think this clearly looks like a bug in PostgreSQL because of the following
observations:

Below are comments to the screencast at http://screencast.com/t/NTk2Y2VhMW

The following example is not specific for Glue Finance database.
Attached, please find the text file with the queries and simple example
schema.

1. Process 1 executes "UPDATE A SET Col1 = 1 WHERE AID = 1;".
We can see it obtains two RowExclusiveLocks on relations "a_pkey" and "a".
This is the expected result.

2. Process 2 then executes "UPDATE B SET Col2 = 1 WHERE BID = 2;".
We can see it obtains two RowExclusiveLocks on relations "b_pkey" and "b".
I don't know if this is expected, since the row in B references the row in A
being updated by process 1.
Because of the foreign key, shouldn't some kind of share lock on A be
obtained by process 2, or some other kind of lock?

3. Process 1 tries to execute "UPDATE B SET Col2 = 1 WHERE BID = 2;" and
will of course have to wait, because process 2 already has a
RowExclusiveLock on the same row in table B.

Process 1 is now waiting...

4. Now, in the other SQL prompt (process 2), we take a look at the vLocks
view.
Unexpected observations:
a) both processes have been granted a RowExclusiveLock on table B. How can
both be granted a RowExclusiveLock on the same table? Since the table only
contains one row, it must be a lock on the same row, which should be
impossible, right?
b) process 1 (which is currently waiting) has been granted a lock of type
"tuple", page 0, tuple 1, mode "ExclusiveLock" on table B. I don't know what
a "tuple" lock is, but what surprises me is process 1 being granted the
lock, and not process 2 (since process 2 updated B before 1).

Now, while process 1 is waiting, let's execute the same query in process 2:

5. Process 2 tries to execute "UPDATE B SET Col2 = 1 WHERE BID = 2;" which
is exactly the same query as in step 2 above.
Since process 2 already hold a granted RowExclusiveLock on the row in table
B it tries to update, I think this query should be executed instantly
without any problem. Instead, it causes a deadlock in process 2, allowing
process 1 to commit. Very strange.

Could this have any other explanation than a bug (or perhaps feature) in
postgres?

--
Best regards,

Joel Jacobson
Glue Finance

E: jj(at)gluefinance(dot)com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box 549
114 11 Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

Attachment Content-Type Size
deadlock.sql application/octet-stream 1.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-08-20 13:15:48 Re: CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
Previous Message Tom Lane 2010-08-20 13:10:51 Re: Why assignment before return?