Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2017-12-08 08:20:10
Message-ID: CAD21AoBS-ckxod0YKz3+OT=i9VYq=34rq22uhWGc+0uTpch8=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Sat, Dec 2, 2017 at 3:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 1, 2017 at 10:14 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> The patched is 10 times faster than current HEAD.
>
> Nifty.

Thank you for your dedicated reviewing the patch.

> The first hunk in monitoring.sgml looks unnecessary.

You meant the following hunk?

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8d461c8..7aa7981 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -669,8 +669,8 @@ postgres 27093 0.0 0.0 30096 2752 ?
Ss 11:34 0:00 postgres: ser
Heavyweight locks, also known as lock manager locks or simply locks,
primarily protect SQL-visible objects such as tables. However,
they are also used to ensure mutual exclusion for certain internal
- operations such as relation extension.
<literal>wait_event</literal> will
- identify the type of lock awaited.
+ operations such as waiting for a transaction to finish.
+ <literal>wait_event</literal> will identify the type of lock awaited.
</para>
</listitem>
<listitem>

I think that since the extension locks are no longer a part of
heavyweight locks we should change the explanation.

> The second hunk breaks the formatting of the documentation; you need
> to adjust the "morerows" value from 9 to 8 here:
>
> <entry morerows="9"><literal>Lock</literal></entry>
>
> And similarly make this one 18:
>
> <entry morerows="17"><literal>IPC</literal></entry>

Fixed.

> +* Relation extension locks. The relation extension lock manager is
> +specialized in relation extensions. In PostgreSQL 11 relation extension
> +lock has been moved out of regular lock. It's similar to regular locks
> +but doesn't have full dead lock detection, group locking and multiple
> +lock modes. When conflicts occur, lock waits are implemented using
> +condition variables.
>
> Higher up, it says that "Postgres uses four types of interprocess
> locks", but because you added this, it's now a list of five items.

Fixed.

> I suggest moving the section on relation locks to the end and
> rewriting the text here as follows: Only one process can extend a
> relation at a time; we use a specialized lock manager for this
> purpose, which is much simpler than the regular lock manager. It is
> similar to the lightweight lock mechanism, but is ever simpler because
> there is only one lock mode and only one lock can be taken at a time.
> A process holding a relation extension lock is interruptible, unlike a
> process holding an LWLock.

Agreed and fixed.

> +#define RelExtLockTargetTagToIndex(relextlock_tag) \
> + (tag_hash((const void *) relextlock_tag, sizeof(RelExtLockTag)) \
> + % N_RELEXTLOCK_ENTS)
>
> How about using a static inline function for this?

Fixed.

> +#define SET_RELEXTLOCK_TAG(locktag, d, r) \
> + ((locktag).dbid = (d), \
> + (locktag).relid = (r))
>
> How about getting rid of this and just doing the assignments instead?

Fixed.

> +#define RELEXT_VAL_LOCK ((uint32) ((1 << 25)))
> +#define RELEXT_LOCK_MASK ((uint32) ((1 << 25)))
>
> It seems confusing to have two macros for the same value and an
> almost-interchangeable purpose. Maybe just call it RELEXT_LOCK_BIT?

Fixed.

>
> +RelationExtensionLockWaiterCount(Relation relation)
>
> Hmm. This is sort of problematic, because with then new design we
> have no guarantee that the return value is actually accurate. I don't
> think that's a functional problem, but the optics aren't great.

Yeah, with this patch we could overestimate it and then add extra
blocks to the relation. Since the number of extra blocks is capped at
512 I think it would not become serious problem.

> + if (held_relextlock.nLocks > 0)
> + {
> + RelExtLockRelease(held_relextlock.relid, true);
> + }
>
> Excess braces.

Fixed.

>
> +int
> +RelExtLockHoldingLockCount(void)
> +{
> + return held_relextlock.nLocks;
> +}
>
> Maybe IsAnyRelationExtensionLockHeld(), returning bool?

Fixed.

> + /* If the lock is held by me, no need to wait */
>
> If we already hold the lock, no need to wait.

Fixed.

> + * Luckily if we're trying to acquire the same lock as what we
> + * had held just before, we don't need to get the entry from the
> + * array by hashing.
>
> We're not trying to acquire a lock here. "If the last relation
> extension lock we touched is the same one for which we now need to
> wait, we can use our cached pointer to the lock instead of recomputing
> it."

Fixed.

> + registered_wait_list = true;
>
> Isn't it really registered_wait_count? The only list here is
> encapsulated in the CV.

Changed to "waiting".

>
> + /* Before retuning, decrement the wait count if we had been waiting */
>
> returning -> returning, but I'd rewrite this as "Release any wait
> count we hold."

Fixed.

> + * Acquire the relation extension lock. If we're trying to acquire the same
> + * lock as what already held, we just increment nLock locally and return
> + * without touching the RelExtLock array.
>
> "Acquire a relation extension lock." I think you can forget the rest
> of this; it duplicates comments in the function body.

Fixed.

> + * Since we don't support dead lock detection for relation extension
> + * lock and don't control the order of lock acquisition, it cannot not
> + * happen that trying to take a new lock while holding an another lock.
>
> Since we don't do deadlock detection, caller must not try to take a
> new relation extension lock while already holding them.

Fixed.

>
> + if (relid == held_relextlock.relid)
> + {
> + held_relextlock.nLocks++;
> + return true;
> + }
> + else
> + elog(ERROR,
> + "cannot acquire relation extension locks for
> multiple relations at the same");
>
> I'd prefer if (relid != held_relextlock.relid) elog(ERROR, ...) to
> save a level of indentation for the rest.

Fixed.

>
> + * If we're trying to acquire the same lock as what we just released
> + * we don't need to get the entry from the array by hashing. we expect
> + * to happen this case because it's a common case in acquisition of
> + * relation extension locks.
>
> "If the last relation extension lock we touched is the same one for we
> now need to acquire, we can use our cached pointer to the lock instead
> of recomputing it. This is likely to be a common case in practice."

Fixed.

>
> + /* Could not got the lock, return iff in conditional locking */
>
> "locking conditionally"

Fixed.

> + ConditionVariableSleep(&(relextlock->cv),
> WAIT_EVENT_RELATION_EXTENSION_LOCK);
> Break line at comma
>

Fixed.

> + /* Decrement wait count if we had been waiting */
>
> "Release any wait count we hold."

Fixed.

> + /* Always return true if not conditional lock */
>
> "We got the lock!"

Fixed.

> + /* If force releasing, release all locks we're holding */
> + if (force)
> + held_relextlock.nLocks = 0;
> + else
> + held_relextlock.nLocks--;
> +
> + Assert(held_relextlock.nLocks >= 0);
> +
> + /* Return if we're still holding the lock even after computation */
> + if (held_relextlock.nLocks > 0)
> + return;
>
> I thought you were going to have the caller adjust nLocks?

Yeah, I was supposed to change so but since we always release either
one lock or all relext locks I thought it'd better to pass a bool
rather than an int.

> + /* Get RelExtLock entry from the array */
> + SET_RELEXTLOCK_TAG(tag, MyDatabaseId, relid);
> + relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)];
>
> This seems to make no sense in RelExtLockRelease -- isn't the cache
> guaranteed valid?

Right, fixed.

>
> + /* Wake up waiters if there is someone looking at this lock */
>
> "If there may be waiters, wake them up."

Fixed.

> + * We allow to take a relation extension lock after took a
> + * heavy-weight lock. However, since we don't have dead lock
> + * detection mechanism between heavy-weight lock and relation
> + * extension lock it's not allowed taking an another heavy-weight
> + * lock while holding a relation extension lock.
>
> "Relation extension locks don't participate in deadlock detection, so
> make sure we don't try to acquire a heavyweight lock while holding
> one."

Fixed.

> + /* Release relation extension locks */
>
> "If we hold a relation extension lock, release it."

Fixed.

> +/* Number of partitions the shared relation extension lock tables are
> divided into */
> +#define LOG2_NUM_RELEXTLOCK_PARTITIONS 4
> +#define NUM_RELEXTLOCK_PARTITIONS (1 << LOG2_NUM_RELEXTLOCK_PARTITIONS)
>
> Dead code.

Fixed.

> When I took a break from sitting at the computer, I realized that I
> think this has a more serious problem: won't it permanently leak
> reference counts if someone hits ^C or an error occurs while the lock
> is held? I think it will -- it probably needs to do cleanup at the
> places where we do LWLockReleaseAll() that includes decrementing the
> shared refcount if necessary, rather than doing cleanup at the places
> we release heavyweight locks.
> I might be wrong about the details here -- this is off the top of my head.

Good catch. It can leak reference counts if someone hits ^C or an
error occurs while waiting. Fixed in the latest patch. But since
RelExtLockReleaseAll() is called even when such situations I think we
don't need to change the place where releasing the all relext lock. We
just moved it from heavyweight locks. Am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
Moving_extension_lock_out_of_heavyweight_lock_v10.patch application/octet-stream 42.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2017-12-08 08:21:17 Re: CUBE seems a bit confused about ORDER BY
Previous Message Craig Ringer 2017-12-08 08:03:48 Re: [HACKERS] Logical decoding on standby