Re: Protect syscache from bloating with negative cache entries

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2019-07-01 07:02:59
Message-ID: 20190701.160259.247597303.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

my_gripe> But, still fluctulates by around 5%..
my_gripe>
my_gripe> If this level of the degradation is still not acceptable, that
my_gripe> means that nothing can be inserted in the code path and the new
my_gripe> code path should be isolated from existing code by using indirect
my_gripe> call.

Finally, after some struggling, I think I could manage to measure
the impact on performace precisely and reliably. Starting from
"make distclean" every time building, then removing all in
$TARGET before installation makes things stable enough. (I don't
think it's good but I didin't investigate the cause..)

I measured time/call by directly calling SearchSysCache3() many
times. It showed that the patch causes around 0.1 microsec
degradation per call. (The funtion overall took about 6.9
microsec on average.)

Next, I counted how many times SearchSysCache is called during a
planning with, as an instance, a query on a partitioned table
having 3000 columns and 1000 partitions.

explain analyze select sum(c0000) from test.p;

Planner made 6020608 times syscache calls while planning and the
overall planning time was 8641ms. (Exec time was 48ms.) 6020608
times 0.1 us is 602 ms of degradation. So roughly -7% degradation
in planning time in estimation. The degradation was given by
really only the two successive instructions "ADD/conditional
MOVE(CMOVE)". The fact leads to the conclusion that the existing
code path as is doesn't have room for any additional code.

So I sought for room at least for one branch and found that (on
gcc 7.3.1/CentOS7/x64). Interestingly, de-inlining
SearchCatCacheInternal gave me gain on performance by about
3%. Further inlining of CatalogCacheComputeHashValue() gave
another gain about 3%. I could add a branch in
SearchCatCacheInteral within the gain.

I also tried indirect calls but the degradation overwhelmed the
gain, so I choosed branching rather than indirect calls. I didn't
investigated how it happens.

The following is the result. The binaries are build with the same
configuration using -O2.

binary means
master : master HEAD.
patched_off : patched, but pruning disabled (catalog_cache_prune_min_age=-1).
patched_on : patched with pruning enabled.
("300s" for 1, "1s" for2, "0" for 3)

bench:
1: corresponds to catcachebench(1); fetching STATRELATTINH 3000
* 1000 times generating new cache entriies. (Massive cache
creatiion)
Pruning doesn't happen while running this.

2: catcachebench(2); 60000 times cache access on 1000
STATRELATTINH entries. (Frequent cache reference)
Pruning doesn't happen while running this.

3: catcachebench(3); fetching 1000(tbls) * 3000(cols)

STATRELATTINH entries. Catcache clock advancing with the
interval of 100(tbls) * 3000(cols) times of access and
pruning happenshoge.

While running catcachebench(3) once, pruning happens 28
times and most of the time 202202 entries are removed and
the total number of entries was limite to 524289. (The
systable has 3000 * 1001 = 3003000 tuples.)

iter: Number of iterations. Time ms and stddev is calculated over
the iterations.

binar | bench | iter | time ms | stddev
-------------+-------+-------+----------+--------
master | 1 | 10 | 8150.30 | 12.96
master | 2 | 10 | 4002.88 | 16.18
master | 3 | 10 | 9065.06 | 11.46
-------------+-------+-------+----------+--------
patched_off | 1 | 10 | 8090.95 | 9.95
patched_off | 2 | 10 | 3984.67 | 12.33
patched_off | 3 | 10 | 9050.46 | 4.64
-------------+-------+-------+----------+--------
patched_on | 1 | 10 | 8158.95 | 6.29
patched_on | 2 | 10 | 4023.72 | 10.41
patched_on | 3 | 10 | 16532.66 | 18.39

patched_off is slightly faster than master. patched_on is
generally a bit slower. Even though patched_on/3 seems take too
long time, the extra time comes from increased catalog table
acess in exchange of memory saving. (That is, it is expected
behavior.) I ran it several times and most them showed the same
tendency.

As a side-effect, once the branch added, the shared syscache in a
neighbour thread will be able to be inserted together without
impact on existing code path.

===
The benchmark script is used as the follows:

- create many (3000, as example) tables in "test" schema. I
created a partitioned table with 3000 children.

- The tables have many columns, 1000 for me.

- Run the following commands.

=# select catcachebench(0); -- warm up systables.
=# set catalog_cache_prune_min_age = any; -- as required
=# select catcachebench(n); -- 3 >= n >= 1, the number of "bench" above.

The above result is taked with the following query.

=# select 'patched_on', '3' , count(a), avg(a)::numeric(10,2), stddev(a)::numeric(10,2) from (select catcachebench(3) from generate_series(1, 10)) as a(a);

====
The attached patches are:

0001-Adjust-inlining-of-some-functions.patch:

Changes inlining property of two functions,
SearchCatCacheInternal and CatalogCacheComputeHashValue.

0002-Benchmark-extension-and-required-core-change.patch:

Micro benchmark of SearchSysCache3() and core-side tweaks, which
is out-of this patch set in the view of functionality. Works for
0001 but not for 0004 or later. 0003 adjusts that.

0003-Adjust-catcachebench-for-later-patches.patch

Adjustment of 0002, benchmark for 0004, the body of this
patchset. Breaks code consistency until 0004 applied.

0004-Catcache-pruning-feature.patch

The feature patch, intentionally unchanges indentation of an
existing code block in SearchCatCacheInternal for smaller size
of the patch. It is adjusted in the next 0005 patch.

0005-Adjust-indentation-of-SearchCatCacheInternal.patch

Adjusts indentation of 0004.

0001+4+5 is the final shape of the patch set and 0002+3 is only
for benchmarking.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v18-0001-Adjust-inlining-of-some-functions.patch text/x-patch 2.5 KB
v18-0002-Benchmark-extension-and-required-core-change.patch text/x-patch 10.0 KB
v18-0003-Adjust-catcachebench-for-later-patches.patch text/x-patch 1.2 KB
v18-0004-Catcache-pruning-feature.patch text/x-patch 10.2 KB
v18-0005-Adjust-indentation-of-SearchCatCacheInternal.patch text/x-patch 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-01 07:53:26 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Amit Khandekar 2019-07-01 05:34:05 Re: Minimal logical decoding on standbys