Re: POC: Cache data in GetSnapshotData()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cache data in GetSnapshotData()
Date: 2015-05-20 14:26:39
Message-ID: CAA4eK1+V=f0frHqhzj1hWvx+WzjRQxOzmQg6ARjbRrmq=TKhDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 2, 2015 at 8:57 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
>
> Hi,
>
> I've, for a while, pondered whether we couldn't find a easier way than
> CSN to make snapshots cheaper as GetSnapshotData() very frequently is
> one of the top profile entries. Especially on bigger servers, where the
> pretty much guaranteed cachemisses are quite visibile.
>
> My idea is based on the observation that even in very write heavy
> environments the frequency of relevant PGXACT changes is noticeably
> lower than GetSnapshotData() calls.
>
>
> Comments about the idea?
>

I have done some tests with this patch to see the benefit with
and it seems to me this patch helps in reducing the contention
around ProcArrayLock, though the increase in TPS (in tpc-b tests
is around 2~4%) is not very high.

LWLock_Stats data
-----------------------------
Non-Default postgresql.conf settings
------------------------------------------------------
scale_factor = 3000
shared_buffers=8GB
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
autovacuum=off
synchronous_commit=off

Tests are done on Power-8 m/c.

pgbench (TPC-B test)
./pgbench -c 64 -j 64 -T 1200 -M prepared postgres

Without Patch (HEAD - e5f455f5) - Commit used is slightly old, but
I don't think that matters for this test.

ProcArrayLock
--------------
PID 68803 lwlock main 4: shacq 1278232 exacq 124646 blk 231405 spindelay
2904 dequeue self 63701
PID 68888 lwlock main 4: shacq 1325048 exacq 129176 blk 241605 spindelay
3457 dequeue self 65203
PID 68798 lwlock main 4: shacq 1308114 exacq 127462 blk 235331 spindelay
2829 dequeue self 64893
PID 68880 lwlock main 4: shacq 1306959 exacq 127348 blk 235041 spindelay
3007 dequeue self 64662
PID 68894 lwlock main 4: shacq 1307710 exacq 127375 blk 234356 spindelay
3474 dequeue self 64417
PID 68858 lwlock main 4: shacq 1331912 exacq 129671 blk 238083 spindelay
3043 dequeue self 65257

CLogControlLock
----------------
PID 68895 lwlock main 11: shacq 483080 exacq 226903 blk 38253 spindelay 12
dequeue self 37128
PID 68812 lwlock main 11: shacq 471646 exacq 223555 blk 37703 spindelay 15
dequeue self 36616
PID 68888 lwlock main 11: shacq 475769 exacq 226359 blk 38570 spindelay 6
dequeue self 35804
PID 68798 lwlock main 11: shacq 473370 exacq 222993 blk 36806 spindelay 7
dequeue self 37163
PID 68880 lwlock main 11: shacq 472101 exacq 223031 blk 36577 spindelay 5
dequeue self 37544

With Patch -

ProcArrayLock
--------------
PID 159124 lwlock main 4: shacq 1196432 exacq 118140 blk 128880 spindelay
4601 dequeue self 91197
PID 159171 lwlock main 4: shacq 1322517 exacq 130560 blk 141830 spindelay
5180 dequeue self 101283
PID 159139 lwlock main 4: shacq 1294249 exacq 127877 blk 139318 spindelay
5735 dequeue self 100740
PID 159199 lwlock main 4: shacq 1077223 exacq 106398 blk 115625 spindelay
3627 dequeue self 81980
PID 159193 lwlock main 4: shacq 1364230 exacq 134757 blk 146335 spindelay
5390 dequeue self 103907

CLogControlLock
----------------
PID 159124 lwlock main 11: shacq 443221 exacq 202970 blk 88076 spindelay
533 dequeue self 70673
PID 159171 lwlock main 11: shacq 488979 exacq 227730 blk 103233 spindelay
597 dequeue self 76776
PID 159139 lwlock main 11: shacq 469582 exacq 218877 blk 94736 spindelay
493 dequeue self 74813
PID 159199 lwlock main 11: shacq 391470 exacq 181381 blk 74061 spindelay
309 dequeue self 64393
PID 159193 lwlock main 11: shacq 499489 exacq 235390 blk 106459 spindelay
578 dequeue self 76922

We can clearly see that *blk* count with Patch for ProcArrayLock
has decreased significantly, though it results in increase of blk
count in CLogControlLock, but that is the effect of shift in contention.

+1 to proceed with this patch for 9.6, as I think this patch improves the
situation with compare to current.

Also I have seen crash once in below test scenario:
Crashed in test with scale-factor - 300, other settings same as above:
./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruno Harbulot 2015-05-20 15:13:21 Re: Problems with question marks in operators (JDBC, ECPG, ...)
Previous Message Tom Lane 2015-05-20 14:09:17 Re: Change pg_cancel_*() to ignore current backend