Re: POC: Cache data in GetSnapshotData()

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cache data in GetSnapshotData()
Date: 2017-08-29 09:04:40
Message-ID: CAD__Oui=Q++3ER0pU2HanJqE7nGtOhWxypxxJQVsJqNS91LG-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 3, 2017 at 2:16 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>I think this patch should have a "cover letter" explaining what it's
> trying to achieve, how it does so and why it's safe/correct.

Cache the SnapshotData for reuse:
===========================
In one of our perf analysis using perf tool it showed GetSnapshotData
takes a very high percentage of CPU cycles on readonly workload when
there is very high number of concurrent connections >= 64.

Machine : cthulhu 8 node machine.
----------------------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 2128.000
BogoMIPS: 4266.63
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0,65-71,96-103
NUMA node1 CPU(s): 72-79,104-111
NUMA node2 CPU(s): 80-87,112-119
NUMA node3 CPU(s): 88-95,120-127
NUMA node4 CPU(s): 1-8,33-40
NUMA node5 CPU(s): 9-16,41-48
NUMA node6 CPU(s): 17-24,49-56
NUMA node7 CPU(s): 25-32,57-64

On further analysis, it appears this mainly accounts to cache-misses
happening while iterating through large proc array to compute the
SnapshotData. Also, it seems mainly on read-only (read heavy) workload
SnapshotData mostly remains same as no new(rarely a) transaction
commits or rollbacks. Taking advantage of this we can save the
previously computed SanspshotData in shared memory and in
GetSnapshotData we can use the saved SnapshotData instead of computing
same when it is still valid. A saved SnapsotData is valid until no
open transaction end after it.

[Perf cache-misses analysis of Base for 128 pgbench readonly clients]
======================================================
Samples: 421K of event 'cache-misses', Event count (approx.): 160069274
Overhead Command Shared Object Symbol
18.63% postgres postgres [.] GetSnapshotData
11.98% postgres postgres [.] _bt_compare
10.20% postgres postgres [.] PinBuffer
8.63% postgres postgres [.] LWLockAttemptLock
6.50% postgres postgres [.] LWLockRelease

[Perf cache-misses analysis with Patch for 128 pgbench readonly clients]
========================================================
Samples: 357K of event 'cache-misses', Event count (approx.): 102380622
Overhead Command Shared Object Symbol
0.27% postgres postgres [.] GetSnapshotData

with the patch, it appears cache-misses events has reduced significantly.

Test Setting:
=========
Server configuration:
./postgres -c shared_buffers=8GB -N 300 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 -c
wal_buffers=256MB &

pgbench configuration:
scale_factor = 300
./pgbench -c $threads -j $threads -T $time_for_reading -M prepared -S postgres

The machine has 64 cores with this patch I can see server starts
improvement after 64 clients. I have tested up to 256 clients. Which
shows performance improvement nearly max 39% [1]. This is the best
case for the patch where once computed snapshotData is reused further.

The worst case seems to be small, quick write transactions which
frequently invalidate the cached SnapshotData before it is reused by
any next GetSnapshotData call. As of now, I tested simple update
tests: pgbench -M Prepared -N on the same machine with the above
server configuration. I do not see much change in TPS numbers.

All TPS are median of 3 runs
Clients TPS-With Patch 05 TPS-Base %Diff
1 752.461117 755.186777 -0.3%
64 32171.296537 31202.153576 +3.1%
128 41059.660769 40061.929658 +2.49%

I will do some profiling and find out why this case is not costing us
some performance due to caching overhead.

>> diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
>> index a7e8cf2..4d7debe 100644
>> --- a/src/backend/storage/ipc/procarray.c
>> +++ b/src/backend/storage/ipc/procarray.c
>> @@ -366,11 +366,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
>> (arrayP->numProcs - index - 1) * sizeof(int));
>> arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
>> arrayP->numProcs--;
>> + ProcGlobal->is_snapshot_valid = false;
>> LWLockRelease(ProcArrayLock);
>> return;
>> }
>> }
>>
>> + ProcGlobal->is_snapshot_valid = false;
>> /* Oops */
>> LWLockRelease(ProcArrayLock);
>
> Why do we need to do anything here? And if we need to, why not in
> ProcArrayAdd?

-- In ProcArrayRemove I can see ShmemVariableCache->latestCompletedXid
is set to latestXid which is going to change xmax in GetSnapshotData,
hence invalidated the snapshot there. ProcArrayAdd do not seem to be
affecting snapshotData. I have modified now to set
cached_snapshot_valid to false just below the statement
ShmemVariableCache->latestCompletedXid = latestXid only.
+if (TransactionIdIsValid(latestXid))
+{
+ Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
+ /* Advance global latestCompletedXid while holding the lock */
+ if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
+ latestXid))
+ ShmemVariableCache->latestCompletedXid = latestXid;

>> @@ -1552,6 +1557,8 @@ GetSnapshotData(Snapshot snapshot)
>> globalxmin = xmin = xmax;
>>
>> - snapshot->takenDuringRecovery = RecoveryInProgress();
>> -
>
> This movement of code seems fairly unrelated?

-- Sorry not related, It was part of early POC patch so retained it as
it is. Now reverted those changes moved them back under the
ProcArrayLock

>> if (!snapshot->takenDuringRecovery)
>> {
>
> Do we really need to restrict this to !recovery snapshots? It'd be nicer
> if we could generalize it - I don't immediately see anything !recovery
> specific in the logic here.

-- Agree I will add one more patch on this to include standby(recovery
state) case also to unify all the cases where we can cache the
snapshot. In this patch, I have moved the code 2 functions one to
cache the snapshot data another to get the snapshot data from the
cache. In the next patch, I will try to unify these things with
recovery (standby) case. For recovery case, we are copying the xids
from a simple xid array KnownAssignedXids[], I am not completely sure
whether caching them bring similar performance benefits as above so I
will also try to get a perf report for same.

>
>> + * Let only one of the caller cache the computed snapshot, and
>> + * others can continue as before.
>> */
>> - if (!suboverflowed)
>> + if (!ProcGlobal->is_snapshot_valid &&
>> + LWLockConditionalAcquire(&ProcGlobal->CachedSnapshotLock,
>> + LW_EXCLUSIVE))
>> {
>
> I'd also move this to a helper function (the bit inside the lock).

-- Fixed as suggested.

>> +
>> + /*
>> + * The memory barrier has be to be placed here to ensure that
>> + * is_snapshot_valid is set only after snapshot is cached.
>> + */
>> + pg_write_barrier();
>> + ProcGlobal->is_snapshot_valid = true;
>> + LWLockRelease(&ProcGlobal->CachedSnapshotLock);
>
> LWLockRelease() is a full memory barrier, so this shouldn't be required.

-- Okay, Sorry for my understanding, do you want me to set
ProcGlobal->is_snapshot_valid after
LWLockRelease(&ProcGlobal->CachedSnapshotLock)?

>> + SnapshotData cached_snapshot; /* Previously saved snapshot */
>> + volatile bool is_snapshot_valid; /* is above snapshot valid */
>> + TransactionId cached_snapshot_globalxmin; /* globalxmin when above
>
>
> I'd rename is_snapshot_valid to 'cached_snapshot_valid' or such, it's
> not clear from the name what it refers to.

-- Fixed renamed as suggested.

>
>> + LWLock CachedSnapshotLock; /* A try lock to make sure only one
>> + * process write to cached_snapshot */
>> } PROC_HDR;
>>
>
> Hm, I'd not add the lock to the struct, we normally don't do that for
> the "in core" lwlocks that don't exist in a "configurable" number like
> the buffer locks and such.
>

-- I am not really sure about this. Can you please help what exactly I
should be doing here to get this corrected? Is that I have to add it
to lwlocknames.txt as a new LWLock?

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
cache_the_snapshot_performance.ods application/vnd.oasis.opendocument.spreadsheet 17.8 KB
Cache_data_in_GetSnapshotData_POC_05.patch application/octet-stream 18.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-08-29 09:23:31 Re: WIP: Separate log file for extension
Previous Message Etsuro Fujita 2017-08-29 08:37:25 Re: postgres_fdw bug in 9.6