Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2017-11-30 11:20:32
Message-ID: CAD21AoBZ6Ex2qWRwzvS-q1+pVMyZY52TMskC-oChLgy7k3p7Kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 29, 2017 at 5:33 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Nov 26, 2017 at 9:33 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Attached latest patch incorporated all comments so far. Please review it.
>
> I think you only need RelExtLockReleaseAllI() where we currently have
> LockReleaseAll(DEFAULT_LOCKMETHOD, ...) not where we have
> LockReleaseAll(USER_LOCKMETHOD, ...). That's because relation
> extension locks use the default lock method, not USER_LOCKMETHOD.

Fixed.

> You need to update the table of wait events in the documentation.
> Please be sure to actually build the documentation afterwards and make
> sure it looks OK. Maybe the way event name should be
> RelationExtensionLock rather than just RelationExtension; we are not
> waiting for the extension itself.

Fixed. I added both new wait_event and wait_event_type for relext
lock. Also checked to pass a documentation build.

>
> You have a typo/thinko in lmgr/README: confliction is not a word.
> Maybe you mean "When conflicts occur, lock waits are implemented using
> condition variables."

Fixed.

>
> Instead of having shared and exclusive locks, how about just having
> exclusive locks and introducing a new primitive operation that waits
> for the lock to be free and returns without acquiring it? That is
> essentially what brin_pageops.c is doing by taking and releasing the
> shared lock, and it's the only caller that takes anything but an
> exclusive lock. This seems like it would permit a considerable
> simplification of the locking mechanism, since there would then be
> only two possible states: 1 (locked) and 0 (not locked).

I think it's a good idea. With this change, the concurrency of
executing brin_page_cleanup() would get decreased. But since
brin_page_cleanup() is called only during vacuum so far it's no
problem. I think we can process the code in vacuumlazy.c in the same
manner as well. I've changed the patch so that it has only exclusive
locks and introduces WaitForRelationExtensionLockToBeFree() function
to wait for the the lock to be free.

Also, now that we got rid of shared locks, I gathered lock state and
pin count into a atomic uint32.

> In RelExtLockAcquire, I can't endorse this sort of coding:
>
> + if (relid == held_relextlock.lock->relid &&
> + lockmode == held_relextlock.mode)
> + {
> + held_relextlock.nLocks++;
> + return true;
> + }
> + else
> + Assert(false); /* cannot happen */
>
> Either convert the Assert() to an elog(), or change the if-statement
> to an Assert() of the same condition. I'd probably vote for the first
> one. As it is, if that Assert(false) is ever hit, chaos will (maybe)
> ensue. Let's make sure we nip any such problems in the bud.

Agreed, fixed.

>
> "successed" is not a good variable name; that's not an English word.

Fixed.

> + /* Could not got the lock, return iff in conditional locking */
> + if (mustwait && conditional)
>
> Comment contradicts code. The comment is right; the code need not
> test mustwait, as that's already been done.

Fixed.

> The way this is hooked into the shared-memory initialization stuff
> looks strange in a number of ways:
>
> - Apparently, you're making initialize enough space for as many
> relation extension locks as the save of the main heavyweight lock
> table, but that seems like overkill. I'm not sure how much space we
> actually need for relation extension locks, but I bet it's a lot less
> than we need for regular heavyweight locks.

Agreed. The maximum of the number of relext locks is the number of
relations on a database cluster, it's not relevant with the number of
clients. Currently NLOCKENTS() counts the number of locks including
relation extension lock. One idea is to introduce a new GUC to control
the memory size, although the total memory size for locks will get
increased. Probably we can make it behave similar to
max_pred_locks_per_relation. Or, in order to not change total memory
size for lock even after moved it out of heavyweight lock, we can
divide NLOCKENTS() into heavyweight lock and relation extension lock
(for example, 80% for heavyweight locks and 20% relation extension
locks). But the latter would make parameter tuning hard. I'd vote for
the first one to keep it simple. Any ideas? This part is not fixed in
the patch yet.

> - The error message emitted when you run out of space also claims that
> you can fix the issue by raising max_pred_locks_per_transaction, but
> that has no effect on the size of the main lock table or this table.

Fixed.

> - The changes to LockShmemSize() suppose that the hash table elements
> have a size equal to the size of an LWLock, but the actual size is
> sizeof(RELEXTLOCK).

Fixed.

> - I don't really know why the code for this should be daisy-chained
> off of the lock.c code inside of being called from
> CreateSharedMemoryAndSemaphores() just like (almost) all of the other
> subsystems.

Fixed.

>
> This code ignores the existence of multiple databases; RELEXTLOCK
> contains a relid, but no database OID. That's easy enough to fix, but
> it actually causes no problem unless, by bad luck, you have two
> relations with the same OID in different databases that are both being
> rapidly extended at the same time -- and even then, it's only a
> performance problem, not a correctness problem. In fact, I wonder if
> we shouldn't go further: instead of creating these RELEXTLOCK
> structures dynamically, let's just have a fixed number of them, say
> 1024. When we get a request to take a lock, hash <dboid, reloid> and
> take the result modulo 1024; lock the RELEXTLOCK at that offset in the
> array.
>

Attached the latest patch incorporated comments except for the fix of
the memory size for relext lock.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
Moving_extension_lock_out_of_heavyweight_lock_v8.patch application/octet-stream 48.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2017-11-30 11:40:15 Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
Previous Message Simon Riggs 2017-11-30 10:48:20 Re: [HACKERS] Issues with logical replication