Skip site navigation (1) Skip section navigation (2)

Questionable coding in proc.c & lock.c

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Questionable coding in proc.c & lock.c
Date: 2000-07-27 20:14:20
Message-ID: 9030.964728860@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
I spent some time looking around for possible causes of the recently
reported deadlock conditions.  I couldn't find any smoking gun, but
I found a couple of things that look fishy:

1. I notice that ProcReleaseAll does not act quite the same way that
ProcRelease does.  Look in particular at the logic for handling
wakeupNeeded in each one.  There is some useless code in ProcRelease,
but the bottom line is that it *will* call ProcLockWakeup() unless it
finds there are no remaining holders (and deletes the lock instead).
OTOH, ProcReleaseAll will only call ProcLockWakeup() if this test
succeeds for some lockmode:

                if (!wakeupNeeded && xidLook->holders[i] > 0 &&
                    lockMethodTable->ctl->conflictTab[i] & lock->waitMask)
                    wakeupNeeded = true;

I spent some time trying to see a way that this might fail to wake up
a waiter who needs to be woken up.  I don't see one, but either this
code is broken or ProcRelease is doing unnecessary work.


2. The loop in ProcLockWakeup() has been broken since the day it was
written.  It's trying to scan through the list of waiting processes
and awaken all that can legitimately be awoken.  However, as soon as
it finds one that cannot be woken, it sets last_locktype = proc->token
and then continues the loop *without advancing proc*.  All subsequent
iterations will compare proc->token == last_locktype, find they are
equal, and then continue --- again without advancing proc.  It's not
an infinite loop because there's a list-length counter, but none of
the proc entries beyond the first unawakenable one will be examined.

The net effect is that the behavior is the same as it was in 6.3 or so,
when the loop was just "while (LockResolveConflicts(...) succeeds) do
remove-and-awaken-first-entry".

So the question is, is there ever a case where it is necessary to wake
processes that are in the list beyond one that can't be woken?
I haven't been able to conjure up an example, but I am suspicious.

This error cannot explain any new hangs seen in 7.0, since the effective
behavior of ProcLockWakeup() hasn't changed since Postgre95.  But
perhaps someone made a change elsewhere relying on the assumption that
ProcLockWakeup() would wake up any available waiter.


Comments anyone?

			regards, tom lane

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2000-07-27 20:23:14
Subject: Re: Is Pg 7.0.x's Locking Mechanism BROKEN?
Previous:From: Kovacs Zoltan SandorDate: 2000-07-27 20:00:08
Subject: Re: Industrial-Strength Logging

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group