Re: Speed up Clog Access by increasing CLOG buffers

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-10-21 02:27:41
Message-ID: CAFiTN-tCqDucbikxue0x0dwNOAa6yzjMxJNgXGQm90jVVRSj0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

> In the results you've posted on 10/12, you've mentioned a regression with 32
> clients, where you got 52k tps on master but only 48k tps with the patch (so
> ~10% difference). I have no idea what scale was used for those tests,

That test was with scale factor 300 on POWER 4 socket machine. I think
I need to repeat this test with multiple reading to confirm it was
regression or run to run variation. I will do that soon and post the
results.

> and I
> see no such regression in the current results (but you only report results
> for some of the client counts).

This test is on X86 8 socket machine, At 1000 scale factor I have
given reading with all client counts (32,64,96,192), but at 300 scale
factor I posted only with 192 because on this machine (X86 8 socket
machine) I did not see much load on ClogControlLock at 300 scale
factor.
>
> Also, which of the proposed patches have you been testing?
I tested with GroupLock patch.

> Can you collect and share a more complete set of data, perhaps based on the
> scripts I use to do tests on the large machine with 36/72 cores, available
> at https://bitbucket.org/tvondra/hp05-results ?

I think from my last run I did not share data for -> X86 8 socket
machine, 300 scale factor, 32,64,96 client. I already have those data
so I ma sharing it. (Please let me know if you want to see at some
other client count, for that I need to run another test.)

Head:
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 77233356
latency average: 0.746 ms
tps = 42907.363243 (including connections establishing)
tps = 42907.546190 (excluding connections establishing)
[dilip(dot)kumar(at)cthulhu bin]$ cat 300_32_ul.txt
111757 |
3666
1289 LWLockNamed | ProcArrayLock
1142 Lock | transactionid
318 LWLockNamed | CLogControlLock
299 Lock | extend
109 LWLockNamed | XidGenLock
70 LWLockTranche | buffer_content
35 Lock | tuple
29 LWLockTranche | lock_manager
14 LWLockTranche | wal_insert
1 Tuples only is on.
1 LWLockNamed | CheckpointerCommLock

Group Lock Patch:

scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 77544028
latency average: 0.743 ms
tps = 43079.783906 (including connections establishing)
tps = 43079.960331 (excluding connections establishing
112209 |
3718
1402 LWLockNamed | ProcArrayLock
1070 Lock | transactionid
245 LWLockNamed | CLogControlLock
188 Lock | extend
80 LWLockNamed | XidGenLock
76 LWLockTranche | buffer_content
39 LWLockTranche | lock_manager
31 Lock | tuple
7 LWLockTranche | wal_insert
1 Tuples only is on.
1 LWLockTranche | buffer_mapping

Head:
number of clients: 64
number of threads: 64
duration: 1800 s
number of transactions actually processed: 76211698
latency average: 1.512 ms
tps = 42339.731054 (including connections establishing)
tps = 42339.930464 (excluding connections establishing)
[dilip(dot)kumar(at)cthulhu bin]$ cat 300_64_ul.txt
215734 |
5106 Lock | transactionid
3754 LWLockNamed | ProcArrayLock
3669
3267 LWLockNamed | CLogControlLock
661 Lock | extend
339 LWLockNamed | XidGenLock
310 Lock | tuple
289 LWLockTranche | buffer_content
205 LWLockTranche | lock_manager
50 LWLockTranche | wal_insert
2 LWLockTranche | buffer_mapping
1 Tuples only is on.
1 LWLockTranche | proc

GroupLock patch:
scaling factor: 300
query mode: prepared
number of clients: 64
number of threads: 64
duration: 1800 s
number of transactions actually processed: 76629309
latency average: 1.503 ms
tps = 42571.704635 (including connections establishing)
tps = 42571.905157 (excluding connections establishing)
[dilip(dot)kumar(at)cthulhu bin]$ cat 300_64_ul.txt
217840 |
5197 Lock | transactionid
3744 LWLockNamed | ProcArrayLock
3663
966 Lock | extend
849 LWLockNamed | CLogControlLock
372 Lock | tuple
305 LWLockNamed | XidGenLock
199 LWLockTranche | buffer_content
184 LWLockTranche | lock_manager
35 LWLockTranche | wal_insert
1 Tuples only is on.
1 LWLockTranche | proc
1 LWLockTranche | buffer_mapping

Head:
scaling factor: 300
query mode: prepared
number of clients: 96
number of threads: 96
duration: 1800 s
number of transactions actually processed: 77663593
latency average: 2.225 ms
tps = 43145.624864 (including connections establishing)
tps = 43145.838167 (excluding connections establishing)

302317 |
18836 Lock | transactionid
12912 LWLockNamed | CLogControlLock
4120 LWLockNamed | ProcArrayLock
3662
1700 Lock | tuple
1305 Lock | extend
1030 LWLockTranche | buffer_content
828 LWLockTranche | lock_manager
730 LWLockNamed | XidGenLock
107 LWLockTranche | wal_insert
4 LWLockTranche | buffer_mapping
1 Tuples only is on.
1 LWLockTranche | proc
1 BufferPin | BufferPin

Group Lock Patch:
scaling factor: 300
query mode: prepared
number of clients: 96
number of threads: 96
duration: 1800 s
number of transactions actually processed: 61608756
latency average: 2.805 ms
tps = 44385.885080 (including connections establishing)
tps = 44386.297364 (excluding connections establishing)
[dilip(dot)kumar(at)cthulhu bin]$ cat 300_96_ul.txt
237842 |
14379 Lock | transactionid
3335 LWLockNamed | ProcArrayLock
2850
1374 LWLockNamed | CLogControlLock
1200 Lock | tuple
992 Lock | extend
717 LWLockNamed | XidGenLock
625 LWLockTranche | lock_manager
259 LWLockTranche | buffer_content
105 LWLockTranche | wal_insert
4 LWLockTranche | buffer_mapping
2 LWLockTranche | proc

Head:
scaling factor: 300
query mode: prepared
number of clients: 192
number of threads: 192
duration: 1800 s
number of transactions actually processed: 65930726
latency average: 5.242 ms
tps = 36621.827041 (including connections establishing)
tps = 36622.064081 (excluding connections establishing)
[dilip(dot)kumar(at)cthulhu bin]$ cat 300_192_ul.txt
437848 |
118966 Lock | transactionid
88869 LWLockNamed | CLogControlLock
18558 Lock | tuple
6183 LWLockTranche | buffer_content
5664 LWLockTranche | lock_manager
3995 LWLockNamed | ProcArrayLock
3646
1748 Lock | extend
1635 LWLockNamed | XidGenLock
401 LWLockTranche | wal_insert
33 BufferPin | BufferPin
5 LWLockTranche | proc
3 LWLockTranche | buffer_mapping

GroupLock Patch:
scaling factor: 300
query mode: prepared
number of clients: 192
number of threads: 192
duration: 1800 s
number of transactions actually processed: 82616270
latency average: 4.183 ms
tps = 45894.737813 (including connections establishing)
tps = 45894.995634 (excluding connections establishing)
120372 Lock | transactionid
16346 Lock | tuple
7489 LWLockTranche | lock_manager
4514 LWLockNamed | ProcArrayLock
3632
3310 LWLockNamed | CLogControlLock
2287 LWLockNamed | XidGenLock
2271 Lock | extend
709 LWLockTranche | buffer_content
490 LWLockTranche | wal_insert
30 BufferPin | BufferPin
10 LWLockTranche | proc
6 LWLockTranche | buffer_mapping

Summary: On (X86 8 Socket machine, 300 S.F), I did not observe
significant wait on ClogControlLock upto 96 clients. However at 192 we
can see significant wait on ClogControlLock, but still not as bad as
we see on POWER.

>
> I've taken some time to build a simple web-based reports from the results
> collected so far (also included in the git repository), and pushed them
> here:
>
> http://tvondra.bitbucket.org
>
> For each of the completed runs, there's a report comparing tps for different
> client counts with master and the three patches (average tps, median and
> stddev), and it's possible to download a more thorough text report with wait
> event stats, comparison of individual runs etc.

I saw your report, I think presenting it this way can give very clear idea.
>
> If you want to cooperate on this, I'm available - i.e. I can help you get
> the tooling running, customize it etc.

That will be really helpful, then next time I can also present my
reports in same format.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2016-10-21 02:44:05 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message David G. Johnston 2016-10-21 02:12:01 Re: Renaming of pg_xlog and pg_clog