bug in fast-path locking

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc: Cousin Marc <cousinmarc(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at>
Subject: bug in fast-path locking
Date: 2012-04-09 01:37:23
Message-ID: CA+TgmobyD_4_NR5wVs7N6W5be9k6F0yQLTGNg4_jV5OUvesm8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 8, 2012 at 12:43 PM, Boszormenyi Zoltan <zb(at)cybertec(dot)at> wrote:
>> Indeed, the unpatched GIT version crashes if you enter
>>  =#lock TABLE pgbench_accounts ;
>> the second time in session 2 after the first one failed. Also,
>> manually spelling it out:
>>
>> Session 1:
>>
>> $ psql
>> psql (9.2devel)
>> Type "help" for help.
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# lock table pgbench_accounts;
>> LOCK TABLE
>> zozo=#
>>
>> Session 2:
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# savepoint a;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> ERROR:  canceling statement due to statement timeout
>> zozo=# rollback to a;
>> ROLLBACK
>> zozo=# savepoint b;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>> Server log after the second lock table:
>>
>> TRAP: FailedAssertion("!(locallock->holdsStrongLockCount == 0)", File:
>> "lock.c", Line: 749)
>> LOG:  server process (PID 12978) was terminated by signal 6: Aborted
>
>
> Robert, the Assert triggering with the above procedure
> is in your "fast path" locking code with current GIT.

Yes, that sure looks like a bug. It seems that if the top-level
transaction is aborting, then LockReleaseAll() is called and
everything gets cleaned up properly; or if a subtransaction is
aborting after the lock is fully granted, then the locks held by the
subtransaction are released one at a time using LockRelease(), but if
the subtransaction is aborted *during the lock wait* then we only do
LockWaitCancel(), which doesn't clean up the LOCALLOCK. Before the
fast-lock patch, that didn't really matter, but now it does, because
that LOCALLOCK is tracking the fact that we're holding onto a shared
resource - the strong lock count. So I think that LockWaitCancel()
needs some kind of adjustment, but I haven't figured out exactly what
it is yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-04-09 02:15:19 Re: ECPG FETCH readahead
Previous Message Adrian Klaver 2012-04-08 23:41:31 Re: 9.1.3 Standby catchup mode