Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2017-12-11 04:51:57
Message-ID: CAD21AoDOmPAf-+P96hLC=mtDhPsA4uRGs53X8jwARAoRfbR=Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Sat, Dec 9, 2017 at 2:00 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 8, 2017 at 3:20 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> The first hunk in monitoring.sgml looks unnecessary.
>>
>> You meant the following hunk?
>>
>> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
>> index 8d461c8..7aa7981 100644
>> --- a/doc/src/sgml/monitoring.sgml
>> +++ b/doc/src/sgml/monitoring.sgml
>> @@ -669,8 +669,8 @@ postgres 27093 0.0 0.0 30096 2752 ?
>> Ss 11:34 0:00 postgres: ser
>> Heavyweight locks, also known as lock manager locks or simply locks,
>> primarily protect SQL-visible objects such as tables. However,
>> they are also used to ensure mutual exclusion for certain internal
>> - operations such as relation extension.
>> <literal>wait_event</literal> will
>> - identify the type of lock awaited.
>> + operations such as waiting for a transaction to finish.
>> + <literal>wait_event</literal> will identify the type of lock awaited.
>> </para>
>> </listitem>
>> <listitem>
>>
>> I think that since the extension locks are no longer a part of
>> heavyweight locks we should change the explanation.
>
> Yes, you are right.
>
>>> +RelationExtensionLockWaiterCount(Relation relation)
>>>
>>> Hmm. This is sort of problematic, because with then new design we
>>> have no guarantee that the return value is actually accurate. I don't
>>> think that's a functional problem, but the optics aren't great.
>>
>> Yeah, with this patch we could overestimate it and then add extra
>> blocks to the relation. Since the number of extra blocks is capped at
>> 512 I think it would not become serious problem.
>
> How about renaming it EstimateNumberOfExtensionLockWaiters?

Agreed, fixed.

>
>>> + /* If force releasing, release all locks we're holding */
>>> + if (force)
>>> + held_relextlock.nLocks = 0;
>>> + else
>>> + held_relextlock.nLocks--;
>>> +
>>> + Assert(held_relextlock.nLocks >= 0);
>>> +
>>> + /* Return if we're still holding the lock even after computation */
>>> + if (held_relextlock.nLocks > 0)
>>> + return;
>>>
>>> I thought you were going to have the caller adjust nLocks?
>>
>> Yeah, I was supposed to change so but since we always release either
>> one lock or all relext locks I thought it'd better to pass a bool
>> rather than an int.
>
> I don't see why you need to pass either one. The caller can set
> held_relextlock.nLocks either with -- or = 0, and then call
> RelExtLockRelease() only if the resulting value is 0.

Fixed.

>
>>> When I took a break from sitting at the computer, I realized that I
>>> think this has a more serious problem: won't it permanently leak
>>> reference counts if someone hits ^C or an error occurs while the lock
>>> is held? I think it will -- it probably needs to do cleanup at the
>>> places where we do LWLockReleaseAll() that includes decrementing the
>>> shared refcount if necessary, rather than doing cleanup at the places
>>> we release heavyweight locks.
>>> I might be wrong about the details here -- this is off the top of my head.
>>
>> Good catch. It can leak reference counts if someone hits ^C or an
>> error occurs while waiting. Fixed in the latest patch. But since
>> RelExtLockReleaseAll() is called even when such situations I think we
>> don't need to change the place where releasing the all relext lock. We
>> just moved it from heavyweight locks. Am I missing something?
>
> Hmm, that might be an OK way to handle it. I don't see a problem off
> the top of my head. It might be clearer to rename it to
> RelExtLockCleanup() though, since it is not just releasing the lock
> but also any wait count we hold.

Yeah, it seems better. Fixed.

> +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
> +#define RELEXT_WAIT_COUNT_MASK ((uint32) ((1 << 24) - 1))
>
> Let's drop the comment here and instead add a StaticAssertStmt() that
> checks this.

Fixed. I added StaticAssertStmt() to InitRelExtLocks().

>
> I am slightly puzzled, though. If I read this correctly, bits 0-23
> are used for the waiter count, bit 24 is always 0, bit 25 indicates
> the presence or absence of an exclusive lock, and bits 26+ are always
> 0. That seems slightly odd. Shouldn't we either use the highest
> available bit for the locker (bit 31) or the lowest one (bit 24)? The
> former seems better, in case MAX_BACKENDS changes later. We could
> make RELEXT_WAIT_COUNT_MASK bigger too, just in case.

I agree with the former. Fixed.

> + /* Make a lock tag */
> + tag.dbid = MyDatabaseId;
> + tag.relid = relid;
>
> What about shared relations? I bet we need to use 0 in that case.
> Otherwise, if backends in two different databases try to extend the
> same shared relation at the same time, we'll (probably) fail to notice
> that they conflict.
>

You're right. I changed it so that we set invalidOId to tag.dbid if
the relation is shared relation.

> + * To avoid unnecessary recomputations of the hash code, we try to do this
> + * just once per function, and then pass it around as needed. we can
> + * extract the index number of RelExtLockArray.
>
> This is just a copy-and-paste from lock.c, but actually we have a more
> sophisticated scheme here. I think you can just drop this comment
> altogether, really.

Fixed.

>
> + return (tag_hash((const void *) locktag, sizeof(RelExtLockTag))
> + % N_RELEXTLOCK_ENTS);
>
> I would drop the outermost set of parentheses. Is the cast to (const
> void *) really doing anything?
>

Fixed.

> + "cannot acquire relation extension locks for
> multiple relations at the same");
>
> cannot simultaneously acquire more than one distinct relation lock?
> As you have it, you'd have to add the word "time" at the end, but my
> version is shorter.

I wanted to mean, cannot acquire relation extension locks for multiple
relations at the "time". Fixed.

>
> + /* Sleep until the lock is released */
>
> Really, there's no guarantee that the lock will be released when we
> wake up. I think just /* Sleep until something happens, then recheck
> */

Fixed.

> + lock_free = (oldstate & RELEXT_LOCK_BIT) == 0;
> + if (lock_free)
> + desired_state += RELEXT_LOCK_BIT;
> +
> + if (pg_atomic_compare_exchange_u32(&relextlock->state,
> + &oldstate, desired_state))
> + {
> + if (lock_free)
> + return false;
> + else
> + return true;
> + }
>
> Hmm. If the lock is not free, we attempt to compare-and-swap anyway,
> but then return false? Why not just lock_free = (oldstate &
> RELEXT_LOCK_BIT) == 0; if (!lock_free) return true; if
> (pg_atomic_compare_exchange(&relextlock->state, &oldstate, oldstate |
> RELEXT_LOCK_BIT)) return false;

Fixed.

>
> + Assert(IsAnyRelationExtensionLockHeld() == 0);
>
> Since this is return bool now, it should just be
> Assert(!IsAnyRelationExtensionLockHeld()).

Fixed.

Attached updated version patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
Moving_extension_lock_out_of_heavyweight_lock_v11.patch application/octet-stream 43.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-12-11 04:54:35 Re: [HACKERS] What does it mean by XLOG_BACKUP_RECORD?
Previous Message Rushabh Lathia 2017-12-11 04:49:06 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)