Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2020-02-12 10:22:26
Message-ID: CAA4eK1K6xPvxBSs8K7QYpCD6xNdPrDJUsqJZHcNgk5Yp--_0eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 12, 2020 at 10:24 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2020-02-11 08:01:34 +0530, Amit Kapila wrote:
> > I don't see much downside with the patch, rather there is a
> > performance increase of 3-9% in various scenarios.
>
> As I wrote in [1] I started to look at this patch. My problem with itis
> that it just seems like the wrong direction architecturally to
> me. There's two main aspects to this:
>
> 1) It basically builds a another, more lightweight but less capable, of
> a lock manager that can lock more objects than we can have distinct
> locks for. It is faster because it uses *one* hashtable without
> conflict handling, because it has fewer lock modes, and because it
> doesn't support detecting deadlocks. And probably some other things.
>
> 2) A lot of the contention around file extension comes from us doing
> multiple expensive things under one lock (determining current
> relation size, searching victim buffer, extending file), and in tiny
> increments (growing a 1TB table by 8kb). This patch doesn't address
> that at all.
>

It seems to me both the two points try to address the performance
angle of the patch, but here our actual intention was to make this
lock block among parallel workers so that we can implement/improve
some of the parallel writes operations (like parallelly vacuuming the
heap or index, parallel bulk load, etc.). Both independently are
worth accomplishing, but not w.r.t parallel writes. Here, we were
doing some benchmarking to see if we haven't regressed performance in
any cases.

> I've focused on 1) in the email referenced above ([1]). Here I'll focus
> on 2).
>
>
>
> Which, I think, pretty clearly shows a few things:
>

I agree with all your below observations.

> 1) It's crucial to move acquiring a victim buffer to the outside of the
> extension lock, as for copy acquiring the victim buffer will commonly
> cause a buffer having to be written out, due to the ringbuffer. This
> is even more crucial when using a logged table, as the writeout then
> also will often also trigger a WAL flush.
>
> While doing so will sometimes add a round of acquiring the buffer
> mapping locks, having to do the FlushBuffer while holding the
> extension lock is a huge problem.
>
> This'd also move a good bit of the cost of finding (i.e. clock sweep
> / ringbuffer replacement) and invalidating the old buffer mapping out
> of the lock.
>

I think this mostly because of the way currently code is arranged to
extend a block via ReadBuffer* API. IIUC, currently the main
operations under relation extension lock are as follows:
a. get the block number for extension via smgrnblocks.
b. find victim buffer
c. associate buffer with the block no. found in step-a.
d. initialize the block with zeros
e. write the block
f. PageInit

I think if we can rearrange such that steps b and c can be done after
e or f, then we don't need to hold the extension lock to find the
victim buffer.

> 2) We need to make the smgrwrite more efficient, it is costing a lot of
> time. A small additional experiment shows the cost of doing 8kb
> writes:
>
> I wrote a small program that just iteratively writes a 32GB file:
>
..
>
>
> So it looks like extending the file with posix_fallocate() might be a
> winner, but only if we actually can do so in larger chunks than 8kb
> at once.
>

A good experiment and sounds like worth doing.

>
>
> 3) We should move the PageInit() that's currently done with the
> extension lock held, to the outside. Since we get the buffer with
> RBM_ZERO_AND_LOCK these days, that should be safe. Also, we don't
> need to zero the entire buffer both in RelationGetBufferForTuple()'s
> PageInit(), and in ReadBuffer_common() before calling smgrextend().
>

Agreed.

I feel all three are independent improvements and can be done separately.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-02-12 10:54:10 Re: Add PostgreSQL home page to --help output
Previous Message Hubert Zhang 2020-02-12 10:12:36 Re: Yet another vectorized engine