Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2019-03-29 08:24:40
Message-ID: 20190329.172440.199616830.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello. Sorry for being late a bit.

At Wed, 27 Mar 2019 17:30:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190327(dot)173037(dot)40342566(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > I don't see much point in continuing to review this patch at this
> > point. There's been no new version of the patch in 3 weeks, and there
> > is -- in my view at least -- a rather frustrating lack of evidence
> > that the complexity this patch introduces is actually beneficial. No
> > matter how many people +1 the idea of making this more complicated, it
> > can't be justified unless you can provide a test result showing that
> > the additional complexity solves a problem that does not get solved
> > without that complexity. And even then, who is going to commit a
> > patch that uses a design which Tom Lane says was tried before and
> > stunk?
>
> Hmm. Anyway it is hit by recent commit. I'll post a rebased
> version and a version reverted to do hole-scan. Then I'll take
> numbers as far as I can and will show the result.. tomorrow.

I took performance numbers for master and three versions of the
patch. Master, LRU, full-scan, modified full-scan. I noticed that
useless scan can be skipped in full-scan version so I added the
last versoin.

I ran three artificial test cases. The database is created by
gen_tbl.pl. Numbers are the average of the fastest five runs in
successive 15 runs.

Test cases are listed below.

1_0. About 3,000,000 negative entries are created in pg_statstic
cache by scanning that many distinct columns. It is 3000 tables
* 1001 columns. Pruning scans happen several times while a run
but no entries are removed. This emulates the bloating phase of
cache. catalog_cache_prune_min_age is default (300s).
(access_tbl1.pl)

1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0,
which means turning off.

2_0. Repeatedly access 1001 of the 3,000,000 entries 6000
times. This emulates the stable cache case without having
pruning. catalog_cache_prune_min_age is default (300s).
(access_tbl2.pl)

2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0,
which means turning off.

3_0. Scan over the 3,000,000 entries twice with setting prune_age
to 10s. A run takes about 18 seconds on my box so fair amount
of old entries are removed. This emulates the stable case with
continuous pruning. (access_tbl3.pl)

2_1. Same to 3_0 except that catalog_cache_prune_min_age is 0,
which means turning off.

The result follows.

| master | LRU | Full |Full-mod|
-----|--------+--------+--------+--------+
1_0 | 17.287 | 17.370 | 17.255 | 16.623 |
1_1 | 17.287 | 17.063 | 16.336 | 17.192 |
2_0 | 15.695 | 18.769 | 18.563 | 15.527 |
2_1 | 15.695 | 18.603 | 18.498 | 18.487 |
3_0 | 26.576 | 33.817 | 34.384 | 34.971 |
3_1 | 26.576 | 27.462 | 26.202 | 26.368 |

The result of 2_0 and 2_1 seems strange, but I show you the
numbers at the present.

- Full-scan seems to have the smallest impact when turned off.

- Full-scan-mod seems to perform best in total. (as far as
Full-mod-2_0 is wrong value..)

- LRU doesn't seem to outperform full scanning.

For your information I measured how long pruning takes time.

LRU 318318 out of 2097153 entries in 26ms: 0.08us/entry.
Full-scan 443443 out of 2097153 entreis in 184ms. 0.4us/entry.

LRU is actually fast to remove entries but the difference seems
to be canceled by the complexity of LRU maintenance.

As my conclusion, we should go with the Full-scan or
Full-scan-mod version. I conduct a further overnight test and
will see which is better.

I attached the test script set. It is used in the folling manner.

(start server)
# perl gen_tbl.pl | psql postgres
(stop server)
# sh run.sh 30 > log.txt # 30 is repeat count
# perl process.pl
| master | LRU | Full |Full-mod|
-----|--------+--------+--------+--------+
1_0 | 16.711 | 17.647 | 16.767 | 17.256 |
...

The attached files are follow.

LRU versions patches.
LRU-0001-Add-dlist_move_tail.patch
LRU-0002-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

Fullscn version patch.
FullScan-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

Fullscn-mod version patch.
FullScan-mod-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

test scripts.
test_script.tar.gz

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
LRU-0001-Add-dlist_move_tail.patch text/x-patch 1.2 KB
LRU-0002-Remove-entries-that-haven-t-been-used-for-a-certain-.patch text/x-patch 11.4 KB
FullScan-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch text/x-patch 10.1 KB
FullScan-mod-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch text/x-patch 10.9 KB
test_script.tar.gz application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-29 08:25:23 Re: Syntax diagrams in user documentation
Previous Message Peter Eisentraut 2019-03-29 08:22:32 Re: Syntax diagrams in user documentation