Skip site navigation (1) Skip section navigation (2)

Global Deadlock Information

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: pgsql-cluster-hackers(at)postgresql(dot)org
Subject: Global Deadlock Information
Date: 2010-02-06 09:13:02
Message-ID: 4B6D329E.6050308@bluegap.ch (view raw or flat)
Thread:
Lists: pgsql-cluster-hackers
Hi,

I'd like to start a thread for discussion of the second item on the 
ClusterFeatures [1] list: Global Deadlock Information.

IIRC there are two aspects to this item: a) the plain notification of a 
deadlock and b) some way to control or intercept deadlock resolution.

The problem this item seems to address is the potential for deadlocks 
between transactions on different nodes. Or put another way: between a 
local transaction and one that's to be applied from a remote node (or 
even between two remote ones - similar issue, though). To ensure 
congruency between nodes, they must take the same measures to resolve 
the deadlock, i.e. abort the same transaction(s).

I certainly disagree with the statement on the wiki that the 
"statement_timeout is the way to avoid global deadlocks", because I 
don't want to have to wait that long until a deadlock gets resolved. 
Further it doesn't even guarantee congruency, depending on the 
implementation of your clustering solution.

I fail to see how a plain notification API would help much. After all, 
this could result in one node notifying having aborted transaction A to 
resolve a deadlock while another node notifies having aborted 
transaction B. You'd end up having to abort two (or more) transaction 
instead of just one to resolve a conflict.

It could get more useful, if enabling such a notification would turn off 
the existing deadlock resolver and leave the resolution of the deadlock 
to the clustering solution. I'd call that an interception.

Such an interception API should IMO provide a way to register a 
callback, which replaces the current deadlock resolver. Upon detection 
of a deadlock, the callback should get a list of transaction ids that 
are part of the lock cycle. It's then up to that callback, to chose one 
and abort that to resolve the conflict.

And now, Greg's List:
 > 1) What feature does this help add from a user perspective?

Preventing cluster-wide deadlocks (while maintaining congruency of 
replicas).

 > 2) Which replication projects would be expected to see an improvement
 > from this addition?

I suspect all multi-master solutions are affected, certainly Postgres-R 
would benefit. Single-master ones certainly don't need it.

 > 3) What makes it difficult to implement?

I don't see any real stumbling block. Deciding on an API needs consensus.

 > 4) Are there any other items on the list this depends on, or that it
 > is expected to have a significant positive/negative interaction with?

Not that I know of.

 > 5) What replication projects include a feature like this already, or a
 > prototype of a similar one, that might be used as a proof of concept
 > or example implementation?

Old Postgres-R versions once had such an interception, but it currently 
lacks a solution for this problem. I don't know of any other project 
that's already solved this.

 > 6) Who is already working on it/planning to work on it/needs it for
 > their related project?

I'm not currently working on it and don't plan to do so (at least) until 
PgCon 2010.


Cluster hackers, is this a good summary which covers your needs as well? 
Something missing?

Regards

Markus Wanner

[1]: feature wish list of cluster hackers:
http://wiki.postgresql.org/wiki/ClusterFeatures


Responses

pgsql-cluster-hackers by date

Next:From: Satoshi NagayasuDate: 2010-02-06 16:05:34
Subject: Re: Global Deadlock Information
Previous:From: Markus WannerDate: 2010-02-06 08:08:45
Subject: Re: PgCon: who will be there?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group