Quick Links

Re: shared-memory based stats collector

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	thomas(dot)munro(at)gmail(dot)com
Cc:	tomas(dot)vondra(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, a(dot)zakirov(at)postgrespro(dot)ru, alvherre(at)2ndquadrant(dot)com, ah(at)cybertec(dot)at, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: shared-memory based stats collector
Date:	2019-07-04 10:27:54
Message-ID:	20190704.192754.27063464.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello.

At Mon, 1 Jul 2019 23:19:31 +1200, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote in <CA+hUKGK5WNCEe9g4ie=-6Oym-WNqYBXX9A1qPgKv89KGkzW72g(at)mail(dot)gmail(dot)com>
> > Fixed. Version 20.
>
> Hello Horiguchi-san,
>
> A new Commitfest is here. This doesn't apply (maybe just because of
> the new improved pgindent). Could we please have a fresh rebase?

Thank you for noticing, Thomas. I rebased and made some possible
improvement. More than that, I wrote a new test script.

- Rebased.

- Reworded almost all of comments. Many of them was found
broken. Added some comments.

- Shortened some LWLocked code paths.

- Got rid of useless palloc for snapshot area of globalStats.

The attached files are:

gendb.pl:
script to generate databases.

statbehch.pl:
benchmarking script.

0001-sequential-scan-for-dshash.patch:
Adds sequential scan feature to dshash

0002-Add-conditional-lock-feature-to-dshash.patch:
Adds conditional lock feature to dshash

0003-Make-archiver-process-an-auxiliary-process.patch:
Change archiver process to auxiliary process. This is needed to
let archive access shared memory.

0004-Shared-memory-based-stats-collector.patch:
The body of this patchset. Changes pgstat from a process
connected by socket to shared memory shared among backends.

0005-Remove-the-GUC-stats_temp_directory.patch:
Remove GUC stats_temp_directory. Separated from 0004 to make it
smaller.

====
Tomas said upthread:
> > While reviewing the patch I've always had issue with evaluating how it
> > behaves for various scenarios / workloads. The reviews generally did
> > one
> > specific benchmark, but I find that unsatisfactory. I wonder whether
> > if
> > we could develop a small set of more comprehensive workloads for this
> > patch (i.e. different numbers of objects, access patterns, ...).

The structure of shared stats area follows:

dshash for databsae stats
+ dshash entry for db1
+ dshash for table stats
+ dshash entry for tbl1
+ dshash entry for tbl2
+ dshash entry for tbl3
...
+ db2
...
+ db3
...

Dshash restiricts an entry to be accessed only by a single
process. This is quite inconvenient since that a database hash
entry becomes a bottle neck. On the other hand dbstat dshash
entry used on a backend is not removed since it is removed after
all accessor to the databse have gone. So this patch immediately
releases dbstats dshash entry so that it doesn't become a
bottleneck.

Another bottle neck would be lock conflicts on a
database/table/function stats entry. This is avoided by, like
existing stats collector, enforces intervals not shorter than 500
ms (PGSTAT_STAT_MIN_INTERVAL) between two successive updates on
one process and skipping the update if lock is going to conflict.

Yet another bottle neck was conflict between reset and
udpate. Since all processes are working on the same data on
shared memory, counters cannot be reset until all referer are
gone. So I let dbentry have two sets of table/function stats
dshash in order to separate accessors come after reset and
existing accessors. A process "pins" the current table/function
dshashes before accessing them (pin_hashes()/unpin_hashes()). All
updates in the round are performed on the "pinned" generation of
dshashes. If two or more successive reset requests come in a
very short time, the requests other than the first one are just
ignored (reset_dbentry_counters()). So client can see some
non-zero numbers just after reset if many processes reset stats
at the same time but I don't think that is worth amending.

After all, almost all potential bottle necks are eliminated in
this patch. If so many n clients are running, the mean interval
of updates would be 500/n ms so 1000 or more clients can hit by
the bottle neck but I think also the current stats collecter
suffers from such many clients. (No matter what would happen with
such massive number of processes, I don't have an environment
that can let such many clients/backends live on...)

That being said, this patch is doomed to update stats in
reasonable period, 1000ms in this patch
(PGSTAST_STAT_MAX_INTERVAL). If that duration elapsed, this patch
waits all required locks to be acquired. So, the possible bottle
neck is still on database and table/function shard hash
entries. It is efficiently causes the conflicts that many
processes on the same database update the same table.

I remade benchmark script so that many parameters can be changed
easily. I took numbers for the following access patterns. Every
number is the mean of 6 runs. I choosed the configurations so
that no disk access happenes while running benchmark.

#db : number of accessing database
#tbl : number of tables per database
#client : number of stats-updator clients
#iter : number of query iterations
#xactlen : number of queries in a transaction
#referers: number of stats-referenceing clients

#db #tbl #clients #iter #xactlen #referers
A1: 1 1 1 20000 10 0
A2: 1 1 1 20000 10 1
B1: 1 1 90 2000 10 0
B2: 1 1 90 2000 10 1
C1: 1 50 90 2000 10 0
C2: 1 50 90 2000 10 1
D1: 50 1 90 2000 10 0
D2: 50 1 90 2000 10 1
E1: 50 1 90 2000 10 10
F1: 50 1 10 2000 10 90

master patched
updator referrer updator referrer
time / stdev count / stdev time / stdev count / stdev
A1: 1769.13 / 71.87 1729.97 / 61.58
A2: 1903.94 / 75.65 2906.67 / 78.28 1849.41 / 43.00 2855.33 / 62.95
B1: 2967.84 / 9.88 2984.20 / 6.10
B2: 3005.38 / 5.32 253.00 / 33.09 3007.26 / 5.70 253.33 / 60.63
C1: 3066.14 / 13.80 3069.34 / 11.65
C2: 3353.66 / 8.14 282.92 / 20.65 3341.36 / 12.44 251.65 / 21.13
D1: 2977.12 / 5.12 2991.60 / 6.68
D2: 3005.80 / 6.44 252.50 / 38.34 3010.58 / 7.34 282.33 / 57.07
E1: 3255.47 / 8.91 244.02 / 17.03 3293.88 / 18.05 249.13 / 14.58
F1: 2620.85 / 9.17 202.46 / 3.35 2668.60 / 41.04 208.19 / 6.79

ratio (100: same, smaller value means patched version is faster)

updator referrer
patched/master(%) master/patched (%)
A1: 97.79 -
A2: 97.14 101.80
B1: 100.55
B2: 100.06 99.87
C1: 100.10
C2: 99.63 112.43
D1: 100.49
D2: 100.16 89.43
E1: 101.18 97.95
F1: 101.82 97.25

Mmm... I don't see distinctive tendency.. Referrer side shows
larger fluctuation but I'm not sure that suggests something
meaningful.

I'll rerun the bencmarks with loger period (many itererations).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-sequential-scan-for-dshash.patch	text/x-patch	10.6 KB
0002-Add-conditional-lock-feature-to-dshash.patch	text/x-patch	5.0 KB
0003-Make-archiver-process-an-auxiliary-process.patch	text/x-patch	11.9 KB
0004-Shared-memory-based-stats-collector.patch	text/x-patch	216.2 KB
0005-Remove-the-GUC-stats_temp_directory.patch	text/x-patch	10.6 KB

In response to

Re: shared-memory based stats collector at 2019-07-01 11:19:31 from Thomas Munro

Responses

Re: shared-memory based stats collector at 2019-07-10 04:35:52 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2019-07-04 10:59:47	Re: Index Skip Scan
Previous Message	tushar	2019-07-04 10:22:21	Re: Minimal logical decoding on standbys