Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2020-03-05 06:44:46
Message-ID: CAA4eK1LEwuEP=JqCNjwjqna7hZ=1HSyhjFmJ++Q66wY6B7rstA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor
<mahi6run(at)gmail(dot)com> wrote:
>
> On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > What I'm advocating is that extension locks should continue to go
> > > through lock.c. And yes, that requires some changes to group locking,
> > > but I still don't see why they'd be complicated.
> > >
> >
> > Fair position, as per initial analysis, I think if we do below three
> > things, it should work out without changing to a new way of locking
> > for relation extension or page type locks.
> > a. As per the discussion above, ensure in code we will never try to
> > acquire another heavy-weight lock after acquiring relation extension
> > or page type locks (probably by having Asserts in code or maybe some
> > other way).
> > b. Change lock.c so that group locking is not considered for these two
> > lock types. For ex. in LockCheckConflicts, along with the check (if
> > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)),
> > we also check lock->tag and call it a conflict for these two locks.
> > c. The deadlock detector can ignore checking these two types of locks
> > because point (a) ensures that those won't lead to deadlock. One idea
> > could be that FindLockCycleRecurseMember just ignores these two types
> > of locks by checking the lock tag.
>
> Thanks Amit for summary.
>
> Based on above 3 points, here attaching 2 patches for review.
>
> 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar)
> Basically this patch is for point b and c.
>
> 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch
> (Patch by me)
> This patch is for point a.
>
> After applying both the patches, make check-world is passing.
>
> We are testing both the patches and will post results.
>

I think we need to do detailed code review in the places where we are
taking Relation Extension Lock and see whether we are acquiring
another heavy-weight lock after that. It seems to me that in
brin_getinsertbuffer, after acquiring Relation Extension Lock, we
might again try to acquire the same lock. See
brin_initialize_empty_new_buffer which is called after acquiring
Relation Extension Lock, in that function, we call
RecordPageWithFreeSpace and that can again try to acquire the same
lock if it needs to perform fsm_extend. I think there will be similar
instances in the code. I think it is fine if we again try to acquire
it, but the current assertion in your patch needs to be adjusted for
that.

Few other minor comments on
v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any:
1. Ideally, this should be the first patch as we first need to ensure
that we don't take any heavy-weight locks after acquiring a relation
extension lock.

2. I think it is better to add an Assert after initial error checks
(after RecoveryInProgress().. check)

3.
+ Assert (locallock->tag.lock.locktag_type != LOCKTAG_RELATION_EXTEND ||
+ locallock->nLocks == 0);

I think it is not possible that we have an entry in
LockMethodLocalHash and its value is zero. Do you see any such
possibility, if not, then we might want to remove it?

4. We already have a macro for LOCALLOCK_LOCKMETHOD, can we write
another one tag type? This will make the check look a bit cleaner and
probably if we need to extend it in future for Page type locks, then
also it will be good.

5. I have also tried to think of another way to check if we already
hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a
cheaper way than this. Basically, I think if we traverse the
MyProc->myProcLocks queue, we will get this information, but that
doesn't seem much cheaper than this.

6. Another thing that could be possible is to make this a test and
elog so that it can hit in production scenarios, but I think the cost
of that will be high unless we have a very simple way to write this
test condition.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-03-05 07:06:50 Re: Make mesage at end-of-recovery less scary.
Previous Message David Rowley 2020-03-05 06:40:36 Re: Berserk Autovacuum (let's save next Mandrill)