Re: bug in fast-path locking

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Cousin Marc <cousinmarc(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at>
Subject: Re: bug in fast-path locking
Date: 2012-04-10 06:54:54
Message-ID: 4F83D93E.10606@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012-04-09 19:32 keltezéssel, Robert Haas írta:
> On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> Robert, the Assert triggering with the above procedure
>>> is in your "fast path" locking code with current GIT.
>> Yes, that sure looks like a bug. It seems that if the top-level
>> transaction is aborting, then LockReleaseAll() is called and
>> everything gets cleaned up properly; or if a subtransaction is
>> aborting after the lock is fully granted, then the locks held by the
>> subtransaction are released one at a time using LockRelease(), but if
>> the subtransaction is aborted *during the lock wait* then we only do
>> LockWaitCancel(), which doesn't clean up the LOCALLOCK. Before the
>> fast-lock patch, that didn't really matter, but now it does, because
>> that LOCALLOCK is tracking the fact that we're holding onto a shared
>> resource - the strong lock count. So I think that LockWaitCancel()
>> needs some kind of adjustment, but I haven't figured out exactly what
>> it is yet.
> I looked at this more. The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel(). We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem. I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out. Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
>
> The attached patch is an attempt at implementing that; any reviews appreciated.

This patch indeed fixes the scenario discovered by Cousin Marc.

Reading this patch also made me realize that my lock_timeout
patch needs adjusting, i.e. needs an AbortStrongLockAcquire()
call if waiting for a lock timed out.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig& Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2012-04-10 07:02:49 Re: [PATCH] lock_timeout and common SIGALRM framework
Previous Message Heikki Linnakangas 2012-04-10 06:27:50 Re: To Do wiki