Ancient lock bug figured out

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Ancient lock bug figured out
Date: 2000-11-27 23:30:09
Message-ID: 12872.975367809@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

proc.c has the following code --- unchanged since Postgres95 ---
in HandleDeadlock():

/* ---------------------
* Check to see if we've been awoken by anyone in the interim.
*
* If we have we can return and resume our transaction -- happy day.
* Before we are awoken the process releasing the lock grants it to
* us so we know that we don't have to wait anymore.
*
* Damn these names are LONG! -mer
* ---------------------
*/
if (IpcSemaphoreGetCount(MyProc->sem.semId, MyProc->sem.semNum) ==
IpcSemaphoreDefaultStartValue)
{
UnlockLockTable();
return;
}

/*
* you would think this would be unnecessary, but...
*
* this also means we've been removed already. in some ports (e.g.,
* sparc and aix) the semop(2) implementation is such that we can
* actually end up in this handler after someone has removed us from
* the queue and bopped the semaphore *but the test above fails to
* detect the semaphore update* (presumably something weird having to
* do with the order in which the semaphore wakeup signal and SIGALRM
* get handled).
*/
if (MyProc->links.prev == INVALID_OFFSET ||
MyProc->links.next == INVALID_OFFSET)
{
UnlockLockTable();
return;
}

Well, the reason control can get to the "apparently unnecessary" part is
not some weird portability glitch; it is that the first test is WRONG.
IpcSemaphoreGetCount() calls semctl(GETNCNT), which returns not the
current value of the semaphore as this code expects, but the number of
waiters on the semaphore. Since the process doing this test is the
only one that'll ever wait on that semaphore, it is impossible for
IpcSemaphoreGetCount() to return anything but zero, and so the first
if() has never ever succeeded in the entire history of Postgres.

IpcSemaphoreGetCount is used nowhere else. I see no particularly good
reason to have it at all, and certainly not to have it with a name so
easily mistaken for IpcSemaphoreGetValue. It's about to be toast.

Next question is whether to recode the first test "correctly" with
IpcSemaphoreGetValue(), or just remove it. Since we clearly have done
just fine with testing our internal link pointers to detect removal
from the queue, I'm inclined to remove the first test and thus save an
unnecessary kernel call. Comments?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2000-11-27 23:33:57 Re: Constraint names using 'user namespace'?
Previous Message mlw 2000-11-27 23:28:36 8192 BLCKSZ ?