Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date: 2020-02-11 02:28:33
Message-ID: CAA4eK1JfR2k1uLvo2e50q1ZsvXtqgLCgxm41F_8LybiHJ-eVvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 10, 2020 at 10:28 PM Mahendra Singh Thalor
<mahi6run(at)gmail(dot)com> wrote:
>
> On Sat, 8 Feb 2020 at 00:27, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com> wrote:
> >
> > On Thu, 6 Feb 2020 at 09:44, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > The number at 56 and 74 client count seem slightly suspicious. Can
> > > you please repeat those tests? Basically, I am not able to come up
> > > with a theory why at 56 clients the performance with the patch is a
> > > bit lower and then at 74 it is higher.
> >
> > Okay. I will repeat test.
>
> I re-tested in different machine because in previous machine, results are in-consistent
>

Thanks for doing detailed tests.

> My testing machine:
> $ lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 192
> On-line CPU(s) list: 0-191
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 24
> NUMA node(s): 4
> Model: IBM,8286-42A
> L1d cache: 64K
> L1i cache: 32K
> L2 cache: 512K
> L3 cache: 8192K
> NUMA node0 CPU(s): 0-47
> NUMA node1 CPU(s): 48-95
> NUMA node2 CPU(s): 96-143
> NUMA node3 CPU(s): 144-191
>
> ./pgbench -c $threads -j $threads -T 180 -f insert1(dot)sql(at)1 -f insert2(dot)sql(at)1 -f insert3(dot)sql(at)1 -f insert4(dot)sql(at)1 postgres
>
> Clients HEAD(tps) With v14 patch(tps) %change (time: 180s)
> 1 41.491486 41.375532 -0.27%
> 32 335.138568 330.028739 -1.52%
> 56 353.783930 360.883710 +2.00%
> 60 341.741925 359.028041 +5.05%
> 64 338.521730 356.511423 +5.13%
> 66 339.838921 352.761766 +3.80%
> 70 339.305454 353.658425 +4.23%
> 74 332.016217 348.809042 +5.05%
>
> From above results, it seems that there is very little regression with the patch(+-5%) that can be run to run variation.
>

Hmm, I don't see 5% regression, rather it is a performance gain of ~5%
with the patch? When we use regression, that indicates with the patch
performance (TPS) is reduced, but I don't see that in the above
numbers. Kindly clarify.

> >
> > >
> > > > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then load will increase into extension lock. Is this correct way to test?
> > > >
> > >
> > > Hmm, I think instead of directly hacking the code, you might want to
> > > use the operation (probably cluster or vacuum full) where we set
> > > HEAP_INSERT_SKIP_FSM. I think along with this you can try with
> > > unlogged tables because that might stress the extension lock.
> >
> > Okay. I will test.
>
> I tested with unlogged tables also. There also I was getting 3-6% gain in tps.
>
> >
> > >
> > > In the above test, you might want to test with a higher number of
> > > partitions (say up to 100) as well. Also, see if you want to use the
> > > Copy command.
> >
> > Okay. I will test.
>
> I tested with 500, 1000, 2000 paratitions. I observed max +5% regress in the tps and there was no performace degradation.
>

Again, I am not sure if you see performance dip here. I think your
usage of the word 'regression' is not correct or at least confusing.

> For example:
> I created a table with 2000 paratitions and then I checked false sharing.
> Slot NumberSlot Freq.Slot NumberSlot Freq.Slot NumberSlot Freq.
> 156139731144610
> 62713521048810
> 782121031050110
> 812121131070110
> 192111751073710
> 221112351075410
> 367112541078110
> 546113141079010
> 814114191083310
> 917114241088810
>
> From above table, we can see that total 13 child tables are falling in same backet (slot 156) so I did bulk-loading only in those 13 child tables to check tps in false sharing but I noticed that there was no performance degradation.
>

Okay. Is it possible to share these numbers and scripts?

Thanks for doing the detailed tests for this patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-02-11 02:31:34 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Previous Message Andres Freund 2020-02-11 01:57:47 Re: Internal key management system