Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2018-06-05 14:05:32
Message-ID: CAPpHfdsvg=r_Uf=aS4FA3qXi8bw=AWOX0Bkhe7i+nzqoNctMdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 5, 2018 at 4:02 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-06-05 13:09:08 +0300, Alexander Korotkov wrote:
> > It appears that buffer replacement happening inside relation
> > extension lock is affected by starvation on exclusive buffer mapping
> > lwlocks and buffer content lwlocks, caused by many concurrent shared
> > lockers. So, fair lwlock patch have no direct influence to relation
> > extension lock, which is naturally not even lwlock...
>
> Yea, that makes sense. I wonder how much the fix here is to "pre-clear"
> a victim buffer, and how much is a saner buffer replacement
> implementation (either by going away from O(NBuffers), or by having a
> queue of clean victim buffers like my bgwriter replacement).

The particular thing I observed on our environment is BufferAlloc()
waiting hours on buffer partition lock. Increasing NUM_BUFFER_PARTITIONS
didn't give any significant help. It appears that very hot page (root page of
some frequently used index) reside on that partition, so this partition was
continuously under shared lock. So, in order to resolve without changing
LWLock, we probably should move our buffers hash table to something
lockless.

> > I'll post fair lwlock path in a separate thread. It requires detailed
> > consideration and benchmarking, because there is a risk of regression
> > on specific workloads.
>
> I bet that doing it naively will regress massively in a number of cases.

Yes, I suspect the same. However, I tend to think that something is wrong
with LWLock itself. It seems that it is the only of our locks, which provides
some lockers almost infinite starvations under certain workloads. In contrast,
even our SpinLock gives all the waiting processes nearly same chances to
acquire it. So, I think idea of improving LWLock in this aspect deserves
at least further investigation.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-06-05 14:06:33 Re: libpq compression
Previous Message Tomas Vondra 2018-06-05 13:57:15 Re: Spilling hashed SetOps and aggregates to disk