Skip site navigation (1) Skip section navigation (2)

Re: Global Deadlock Information

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Satoshi Nagayasu <satoshi(dot)nagayasu(at)gmail(dot)com>
Cc: pgsql-cluster-hackers(at)postgresql(dot)org
Subject: Re: Global Deadlock Information
Date: 2010-02-06 19:15:18
Message-ID: 4B6DBFC6.6010507@bluegap.ch (view raw or flat)
Thread:
Lists: pgsql-cluster-hackers
Hi,

I'm glad you are joining this discussion, thank you.

Satoshi Nagayasu wrote:
> Some developers (including me!) proposed the lock_timeout
> GUC option.
> 
> http://archives.postgresql.org/pgsql-hackers/2004-06/msg00935.php
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01167.php

Thanks for these pointers.

> I still believe the "lock timeout" feature could help
> resolving a global deadlock in the cluster environment.

Well, you'd always need to find a compromise between waiting long enough 
to not kill transactions just because of high contention, but still 
react promptly enough to to resolve real deadlocks. I'd like to avoid 
such nifty configuration and tuning settings.

> (2) Use the global wait-for graph to detect a global deadlock.

Can you please elaborate on the replication solution that needs such a 
global wait-for graph? Why do you need a global graph, if you replicate 
all of your transaction anyway? Does that global graph imply a global 
abort decision as well?

IMO a local wait-for graph is absolutely sufficient. The problem is just 
that different nodes might reach different decisions on how to resolve 
the deadlock. But if you replicate to all nodes, they will all be able 
to "see" the deadlock, no?

> http://en.wikipedia.org/wiki/Deadlock#Distributed_deadlock

That very article states:

"In a Commitment ordering based distributed environment (including the 
Strong strict two-phase locking (SS2PL, or rigorous) special case) 
distributed deadlocks are resolved automatically by the atomic 
commitment protocol (e.g. two-phase commit (2PC)), and no global 
wait-for graph or other resolution mechanism are needed."

And the issue with "phantom deadlocks" doesn't really excite me either, 
so I'd rather like not having to deal with such things.

> I don't think the callback function is needed to replace
> the current deadlock resolution feature,

Obviously this wish list item needs more discussion. It seems we want 
two rather different things, then.

How does your replication solution cope with the current deadlock 
resolver? How do you prevent it aborting

> but I agree we need a consensus how we could avoid
> the global deadlock situation in the cluster.

How do you get to the situation where you have a global deadlock, but 
not a local one? That seems to imply that you are not replicating locks 
to all nodes.

How do you think Postgres core could help with determining such global 
deadlocks? That seems more like a solution-specific thing to me.

Are we even talking about the same level of locking, namely regular, 
heavy-weight locks (as per the storage/lmgr/README)?

Kind Regards

Markus Wanner


In response to

Responses

pgsql-cluster-hackers by date

Next:From: Josh BerkusDate: 2010-02-06 19:20:55
Subject: Re: PgCon: who will be there?
Previous:From: Satoshi NagayasuDate: 2010-02-06 17:17:16
Subject: Re: Global Deadlock Information

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group