From: | Boszormenyi Zoltan <zb(at)cybertec(dot)at> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Cousin Marc <cousinmarc(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at> |
Subject: | Re: bug in fast-path locking |
Date: | 2012-04-10 06:54:54 |
Message-ID: | 4F83D93E.10606@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2012-04-09 19:32 keltezéssel, Robert Haas írta:
> On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> Robert, the Assert triggering with the above procedure
>>> is in your "fast path" locking code with current GIT.
>> Yes, that sure looks like a bug. It seems that if the top-level
>> transaction is aborting, then LockReleaseAll() is called and
>> everything gets cleaned up properly; or if a subtransaction is
>> aborting after the lock is fully granted, then the locks held by the
>> subtransaction are released one at a time using LockRelease(), but if
>> the subtransaction is aborted *during the lock wait* then we only do
>> LockWaitCancel(), which doesn't clean up the LOCALLOCK. Before the
>> fast-lock patch, that didn't really matter, but now it does, because
>> that LOCALLOCK is tracking the fact that we're holding onto a shared
>> resource - the strong lock count. So I think that LockWaitCancel()
>> needs some kind of adjustment, but I haven't figured out exactly what
>> it is yet.
> I looked at this more. The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel(). We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem. I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out. Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
>
> The attached patch is an attempt at implementing that; any reviews appreciated.
This patch indeed fixes the scenario discovered by Cousin Marc.
Reading this patch also made me realize that my lock_timeout
patch needs adjusting, i.e. needs an AbortStrongLockAcquire()
call if waiting for a lock timed out.
Best regards,
Zoltán Böszörményi
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig& Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
From | Date | Subject | |
---|---|---|---|
Next Message | Boszormenyi Zoltan | 2012-04-10 07:02:49 | Re: [PATCH] lock_timeout and common SIGALRM framework |
Previous Message | Heikki Linnakangas | 2012-04-10 06:27:50 | Re: To Do wiki |