Re: bug in fast-path locking

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc: Cousin Marc <cousinmarc(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at>
Subject: Re: bug in fast-path locking
Date: 2012-04-09 17:32:39
Message-ID: CA+TgmoaqPhECf19D82KDix_bn6S=ef0wTe6nBjju-92md9kPAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Robert, the Assert triggering with the above procedure
>> is in your "fast path" locking code with current GIT.
>
> Yes, that sure looks like a bug.  It seems that if the top-level
> transaction is aborting, then LockReleaseAll() is called and
> everything gets cleaned up properly; or if a subtransaction is
> aborting after the lock is fully granted, then the locks held by the
> subtransaction are released one at a time using LockRelease(), but if
> the subtransaction is aborted *during the lock wait* then we only do
> LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
> fast-lock patch, that didn't really matter, but now it does, because
> that LOCALLOCK is tracking the fact that we're holding onto a shared
> resource - the strong lock count.  So I think that LockWaitCancel()
> needs some kind of adjustment, but I haven't figured out exactly what
> it is yet.

I looked at this more. The above analysis is basically correct, but
the problem goes a bit beyond an error in LockWaitCancel(). We could
also crap out with an error before getting as far as LockWaitCancel()
and have the same problem. I think that a correct statement of the
problem is this: from the time we bump the strong lock count, up until
the time we're done acquiring the lock (or give up on acquiring it),
we need to have an error-cleanup hook in place that will unbump the
strong lock count if we error out. Once we're done updating the
shared and local lock tables, the special handling ceases to be
needed, because any subsequent lock release will go through
LockRelease() or LockReleaseAll(), which will do the appropriate
clenaup.

The attached patch is an attempt at implementing that; any reviews appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
fix-strong-lock-cleanup.patch application/octet-stream 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-04-09 17:38:37 Re: why was the VAR 'optind' never changed in initdb?
Previous Message Tom Lane 2012-04-09 17:30:43 Revisiting extract(epoch from timestamp)