Re: Advisory locks seem rather broken

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Subject: Re: Advisory locks seem rather broken
Date: 2012-05-04 14:32:36
Message-ID: CA+TgmoZQf092bGzoAZ-2+9ZZbhjW+1o-is_a33G1FoOQZMOW+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 4, 2012 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Originally, I thought that the patch should include some kind of
>> accounting mechanism to prevent that from happening, where we'd keep
>> track of the number of fast-path locks that were outstanding and make
>> sure to keep that many slots free in the main lock table, but Noah
>> talked me out of it, on theory that (1) it was very unlikely to occur
>> in practice and (2) if it did occur, then you probably need to bump up
>> max_locks_per_transaction anyway and (3) it amounted to forcing
>> failures in cases where that might not be strictly necessary, which is
>> usually not a great thing to do.
>
> I agree with that, as long as we can be sure that the system behaves
> sanely (doesn't leave the data structures in a corrupt state) when an
> out-of-memory condition does occur.

OK. I believe that commit 53c5b869b464d567c3b8f617201b49a395f437ab
robustified this code path quite a bit; but it is certainly possible
that there are remaining oversights, and I would certainly appreciate
any further review you have time to do. Basically, it seems like the
likely failure modes, if there are further bugs, would be either (1)
failing to track the strong lock counts properly, leading to
performance degradation if the counters become permanently stuck at a
value other than zero even after all the locks are gone or (2) somehow
muffing the migration of a lock from the fast-path mechanism to the
regular mechanism.

When taking a strong lock, the idea is that the strong locker first
bumps the strong lock count. That bump must be unwound if we fail to
acquire the lock, which means it has to be cleaned up in the error
path and any case where we give up (e.g. conditional acquire of a
contended lock). Next, we iterate through all the backend slots and
transfer fast path locks for each one individually. If we fail midway
through, the strong locker must simply make sure to unwind the strong
lock count. The weak lockers whose locks got transferred are fine:
they need to know how to cope with releasing a transferred lock
anyway; whether the backend that did the transfer subsequently blew up
is not something they have any need to care about. Once all the
transfers are complete, the strong locker queues for the lock using
the main mechanism, which now includes all possible conflicting locks.
Again, if we blow up while waiting for the lock, the only extra thing
we need to do is unwind the strong lock count acquisition.

Of course, I may be missing some other kind of bug altogether...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-05-04 15:06:36 Re: Future In-Core Replication
Previous Message Tom Lane 2012-05-04 14:21:01 Re: Advisory locks seem rather broken