Re: BUG #3242: FATAL: could not unlock semaphore: error code 298

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 17:01:06
Message-ID: 4628F1D2.6050302@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Tom Lane wrote:
>>> How is it possible for a semaphore to be unlocked "too many times"?
>>> It's supposed to be a running counter of the net V's minus P's, and
>>> yes it had better be able to count higher than one. Have we chosen
>>> the wrong Windows primitive to implement this?
>
>> No, it's definitly the right primitive. But we're creating it with a max
>> count of 1.
>
> That's definitely wrong. There are at least three reasons for a PG
> process's semaphore to be signaled (heavyweight lock release, LWLock
> release, pin count waiter), and at least two of them can occur
> concurrently (eg, if deadlock checker fires, it will need to take
> LWLocks, but there's nothing saying that the original lock won't be
> released while it waits for an LWLock).
>
> The effective max count on Unixen is typically in the thousands,
> and I'd suggest the same on Windows unless there's some efficiency
> reason to keep it small (in which case, maybe ten would do).

AFAIK there's no problem with huge numbers (it takes an int32, and the
documentation says nothing about a limit - I'm sure it's just a 32-bit
counter in the kernel). I'll give that a shot.

Marcin - can you test a source patch? Or should I try to build you a
binary for testing? It'd be good if you can confirm that it works before
we commit anything, I think.

> I'm astonished that we've not seen this reported before. Has the
> Windows sema code always been like that?

It could be an 8.2 problem, actually, since we had new semaphore code
there. Looking at
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win32/Attic/sema.c?rev=1.13;content-type=text%2Fx-cvsweb-markup,
it looks like we may have used a *semaphore* with just one as top, but
then kept a counter in userspace as well... (Haven't looked through the
details of the code, but it looks that way from a casual view)

//Magnus

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Dunstan 2007-04-20 17:34:31 Re: [HACKERS] Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Previous Message Tom Lane 2007-04-20 16:55:00 Re: BUG #3242: FATAL: could not unlock semaphore: error code 298

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-04-20 17:34:31 Re: [HACKERS] Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Previous Message Tom Lane 2007-04-20 16:55:00 Re: BUG #3242: FATAL: could not unlock semaphore: error code 298