Re: Moving relation extension locks out of heavyweight lock manager

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Moving relation extension locks out of heavyweight lock manager
Date: 2017-10-26 15:03:45
Message-ID: CA+TgmoZbaP_GLgMgiEEXD2oLgMbMVjoK03cvb-on9QwUhmz2tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Since the previous patch conflicts with current HEAD, I attached the
> updated patch for next CF.

I think we should back up here and ask ourselves a couple of questions:

1. What are we trying to accomplish here?

2. Is this the best way to accomplish it?

To the first question, the problem as I understand it as follows:
Heavyweight locks don't conflict between members of a parallel group.
However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases
don't arise, because parallel operations are strictly read-only
(except for inserts by the leader into a just-created table, when only
one member of the group can be taking the lock anyway). However, once
we allow writes, they become possible, so some solution is needed.

To the second question, there are a couple of ways we could fix this.
First, we could continue to allow these locks to be taken in the
heavyweight lock manager, but make them conflict even between members
of the same lock group. This is, however, complicated. A significant
problem (or so I think) is that the deadlock detector logic, which is
already quite hard to test, will become even more complicated, since
wait edges between members of a lock group need to exist at some times
and not other times. Moreover, to the best of my knowledge, the
increased complexity would have no benefit, because it doesn't look to
me like we ever take any other heavyweight lock while holding one of
these four kinds of locks. Therefore, no deadlock can occur: if we're
waiting for one of these locks, the process that holds it is not
waiting for any other heavyweight lock. This gives rise to a second
idea: move these locks out of the heavyweight lock manager and handle
them with separate code that does not have deadlock detection and
doesn't need as many lock modes. I think that idea is basically
sound, although it's possibly not the only sound idea.

However, that makes me wonder whether we shouldn't be a bit more
aggressive with this patch: why JUST relation extension locks? Why
not all four types of locks listed above? Actually, tuple locks are a
bit sticky, because they have four lock modes. The other three kinds
are very similar -- all you can do is "take it" (implicitly, in
exclusive mode), "try to take it" (again, implicitly, in exclusive
mode), or "wait for it to be released" (i.e. share lock and then
release). Another idea is to try to handle those three types and
leave the tuple locking problem for another day.

I suggest that a good thing to do more or less immediately, regardless
of when this patch ends up being ready, would be to insert an
insertion that LockAcquire() is never called while holding a lock of
one of these types. If that assertion ever fails, then the whole
theory that these lock types don't need deadlock detection is wrong,
and we'd like to find out about that sooner or later.

On the details of the patch, it appears that RelExtLockAcquire()
executes the wait-for-lock code with the partition lock held, and then
continues to hold the partition lock for the entire time that the
relation extension lock is held. That not only makes all code that
runs while holding the lock non-interruptible but makes a lot of the
rest of this code pointless. How is any of this atomics code going to
be reached by more than one process at the same time if the entire
bucket is exclusive-locked? I would guess that the concurrency is not
very good here for the same reason. Of course, just releasing the
bucket lock wouldn't be right either, because then ext_lock might go
away while we've got a pointer to it, which wouldn't be good. I think
you could make this work if each lock had both a locker count and a
pin count, and the object can only be removed when the pin_count is 0.
So the lock algorithm would look like this:

- Acquire the partition LWLock.
- Find the item of interest, creating it if necessary. If out of
memory for more elements, sweep through the table and reclaim
0-pin-count entries, then retry.
- Increment the pin count.
- Attempt to acquire the lock atomically; if we succeed, release the
partition lock and return.
- If this was a conditional-acquire, then decrement the pin count,
release the partition lock and return.
- Release the partition lock.
- Sleep on the condition variable until we manage to atomically
acquire the lock.

The unlock algorithm would just decrement the pin count and, if the
resulting value is non-zero, broadcast on the condition variable.

Although I think this will work, I'm not sure this is actually a great
algorithm. Every lock acquisition has to take and release the
partition lock, use at least two more atomic ops (to take the pin and
the lock), and search a hash table. I don't think that's going to be
staggeringly fast. Maybe it's OK. It's not that much worse, possibly
not any worse, than what the main lock manager does now. However,
especially if we implement a solution specific to relation locks, it
seems like it would be better if we could somehow optimize based on
the facts that (1) many relation locks will not conflict and (2) it's
very common for the same backend to take and release the same
extension lock over and over again. I don't have a specific proposal
right now.

Whatever we end up with, I think we should write some kind of a test
harness to benchmark the number of acquire/release cycles per second
that we can do with the current relation extension lock system vs. the
proposed new system. Ideally, we'd be faster, since we're proposing a
more specialized mechanism. But at least we should not be slower.
pgbench isn't a good test because the relation extension lock will
barely be taken let alone contended; we need to check something like
parallel copies into the same table to see any effect.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-26 15:09:43 Re: make async slave to wait for lsn to be replayed
Previous Message Amit Kapila 2017-10-26 14:43:18 Re: CurTransactionContext freed before transaction COMMIT ???