I'm glad you are joining this discussion, thank you.
Satoshi Nagayasu wrote:
> Some developers (including me!) proposed the lock_timeout
> GUC option.
Thanks for these pointers.
> I still believe the "lock timeout" feature could help
> resolving a global deadlock in the cluster environment.
Well, you'd always need to find a compromise between waiting long enough
to not kill transactions just because of high contention, but still
react promptly enough to to resolve real deadlocks. I'd like to avoid
such nifty configuration and tuning settings.
> (2) Use the global wait-for graph to detect a global deadlock.
Can you please elaborate on the replication solution that needs such a
global wait-for graph? Why do you need a global graph, if you replicate
all of your transaction anyway? Does that global graph imply a global
abort decision as well?
IMO a local wait-for graph is absolutely sufficient. The problem is just
that different nodes might reach different decisions on how to resolve
the deadlock. But if you replicate to all nodes, they will all be able
to "see" the deadlock, no?
That very article states:
"In a Commitment ordering based distributed environment (including the
Strong strict two-phase locking (SS2PL, or rigorous) special case)
distributed deadlocks are resolved automatically by the atomic
commitment protocol (e.g. two-phase commit (2PC)), and no global
wait-for graph or other resolution mechanism are needed."
And the issue with "phantom deadlocks" doesn't really excite me either,
so I'd rather like not having to deal with such things.
> I don't think the callback function is needed to replace
> the current deadlock resolution feature,
Obviously this wish list item needs more discussion. It seems we want
two rather different things, then.
How does your replication solution cope with the current deadlock
resolver? How do you prevent it aborting
> but I agree we need a consensus how we could avoid
> the global deadlock situation in the cluster.
How do you get to the situation where you have a global deadlock, but
not a local one? That seems to imply that you are not replicating locks
to all nodes.
How do you think Postgres core could help with determining such global
deadlocks? That seems more like a solution-specific thing to me.
Are we even talking about the same level of locking, namely regular,
heavy-weight locks (as per the storage/lmgr/README)?
In response to
pgsql-cluster-hackers by date
|Next:||From: Josh Berkus||Date: 2010-02-06 19:20:55|
|Subject: Re: PgCon: who will be there?|
|Previous:||From: Satoshi Nagayasu||Date: 2010-02-06 17:17:16|
|Subject: Re: Global Deadlock Information|