Re: Global Deadlock Information

From: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To: Satoshi Nagayasu <satoshi(dot)nagayasu(at)gmail(dot)com>
Cc: Markus Wanner <markus(at)bluegap(dot)ch>, pgsql-cluster-hackers(at)postgresql(dot)org
Subject: Re: Global Deadlock Information
Date: 2010-02-06 16:23:36
Message-ID: ef4f49ae1002060823j333aa556nf9028a885367d4f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-cluster-hackers

Hi,

I'm very interested in how log it takes to determine the global
deadlock using global wait-for graph and if global deadlock detection
disturb other on-going transactions.

----------
Koichi Suzuki

2010/2/7 Satoshi Nagayasu <satoshi(dot)nagayasu(at)gmail(dot)com>:
> Hi Markus,
>
> I attempted in two ways to resolve global deadlock situation
> in the PostgresForest  development.
>
> (1) Use the lock_timeout to avoid from a global deadlock.
>
> The lock_timeout feature is a very simple way to avoid
> from the global deadlock situation.
>
> I disagree "statement_timeout is the way to avoid global
> deadlocks" too, because the statement_timeout kills
> the healthy/long-running transaction by its timeout.
>
> Some developers (including me!) proposed the lock_timeout
> GUC option.
>
> http://archives.postgresql.org/pgsql-hackers/2004-06/msg00935.php
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01167.php
>
> I still believe the "lock timeout" feature could help
> resolving a global deadlock in the cluster environment.
>
> (2) Use the global wait-for graph to detect a global deadlock.
>
> I had an experimental implemetation to use the global wait-for
> graph to prevent the global deadlock.
>
> http://en.wikipedia.org/wiki/Deadlock#Distributed_deadlock
>
> I used the node(server) identifiers and the pg_locks information
> to build the global wait-for graph, and the kill signal
> (or pg_cancel()?) to abort a victim transaction causing
> the deadlock.
>
> I don't think the callback function is needed to replace
> the current deadlock resolution feature,
> but I agree we need a consensus how we could avoid
> the global deadlock situation in the cluster.
>
> Thanks,
>
> On 2010/02/06 18:13, Markus Wanner wrote:
>>
>> Hi,
>>
>> I'd like to start a thread for discussion of the second item on the
>> ClusterFeatures [1] list: Global Deadlock Information.
>>
>> IIRC there are two aspects to this item: a) the plain notification of a
>> deadlock and b) some way to control or intercept deadlock resolution.
>>
>> The problem this item seems to address is the potential for deadlocks
>> between transactions on different nodes. Or put another way: between a
>> local transaction and one that's to be applied from a remote node (or
>> even between two remote ones - similar issue, though). To ensure
>> congruency between nodes, they must take the same measures to resolve
>> the deadlock, i.e. abort the same transaction(s).
>>
>> I certainly disagree with the statement on the wiki that the
>> "statement_timeout is the way to avoid global deadlocks", because I
>> don't want to have to wait that long until a deadlock gets resolved.
>> Further it doesn't even guarantee congruency, depending on the
>> implementation of your clustering solution.
>>
>> I fail to see how a plain notification API would help much. After all,
>> this could result in one node notifying having aborted transaction A to
>> resolve a deadlock while another node notifies having aborted
>> transaction B. You'd end up having to abort two (or more) transaction
>> instead of just one to resolve a conflict.
>>
>> It could get more useful, if enabling such a notification would turn off
>> the existing deadlock resolver and leave the resolution of the deadlock
>> to the clustering solution. I'd call that an interception.
>>
>> Such an interception API should IMO provide a way to register a
>> callback, which replaces the current deadlock resolver. Upon detection
>> of a deadlock, the callback should get a list of transaction ids that
>> are part of the lock cycle. It's then up to that callback, to chose one
>> and abort that to resolve the conflict.
>>
>> And now, Greg's List:
>>  > 1) What feature does this help add from a user perspective?
>>
>> Preventing cluster-wide deadlocks (while maintaining congruency of
>> replicas).
>>
>>  > 2) Which replication projects would be expected to see an improvement
>>  > from this addition?
>>
>> I suspect all multi-master solutions are affected, certainly Postgres-R
>> would benefit. Single-master ones certainly don't need it.
>>
>>  > 3) What makes it difficult to implement?
>>
>> I don't see any real stumbling block. Deciding on an API needs consensus.
>>
>>  > 4) Are there any other items on the list this depends on, or that it
>>  > is expected to have a significant positive/negative interaction with?
>>
>> Not that I know of.
>>
>>  > 5) What replication projects include a feature like this already, or a
>>  > prototype of a similar one, that might be used as a proof of concept
>>  > or example implementation?
>>
>> Old Postgres-R versions once had such an interception, but it currently
>> lacks a solution for this problem. I don't know of any other project
>> that's already solved this.
>>
>>  > 6) Who is already working on it/planning to work on it/needs it for
>>  > their related project?
>>
>> I'm not currently working on it and don't plan to do so (at least) until
>> PgCon 2010.
>>
>>
>> Cluster hackers, is this a good summary which covers your needs as well?
>> Something missing?
>>
>> Regards
>>
>> Markus Wanner
>>
>> [1]: feature wish list of cluster hackers:
>> http://wiki.postgresql.org/wiki/ClusterFeatures
>>
>>
>
>
> --
> NAGAYASU Satoshi <satoshi(dot)nagayasu(at)gmail(dot)com>
>
> --
> Sent via pgsql-cluster-hackers mailing list
> (pgsql-cluster-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-cluster-hackers
>

In response to

Responses

Browse pgsql-cluster-hackers by date

  From Date Subject
Next Message Satoshi Nagayasu 2010-02-06 16:51:25 Re: Global Deadlock Information
Previous Message Satoshi Nagayasu 2010-02-06 16:05:34 Re: Global Deadlock Information