Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, tgl(at)sss(dot)pgh(dot)pa(dot)us, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2019-02-25 06:26:53
Message-ID: 20190225.152653.211323929.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 20 Feb 2019 13:09:08 -0500, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoZXw+SwK_9Tp=wLqZDstW_X+Ant=rd7K+q4zmYONPuL=w(at)mail(dot)gmail(dot)com>
> On Tue, Feb 19, 2019 at 11:15 PM Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Difference from v15:
> >
> > Removed AllocSet accounting stuff. We use approximate memory
> > size for catcache.
> >
> > Removed prune-by-number(or size) stuff.
> >
> > Adressing comments from Tsunakawa-san and Ideriha-san .
> >
> > Separated catcache monitoring feature. (Removed from this set)
> > (But it is crucial to check this feature...)
> >
> > Is this small enough ?
>
> The commit message in 0002 says 'This also can put a hard limit on the
> number of catcache entries.' but neither of the GUCs that you've
> documented have that effect. Is that a leftover from a previous
> version?

Mmm. Right. Thank you for pointing that and sorry for that. Fixed
it including another mistake in the commit message in my repo. It
will appear in the next version.

| Remove entries that haven't been used for a certain time
|
| Catcache entries can be left alone for several reasons. It is not
| desirable that they eat up memory. With this patch, entries that
| haven't been used for a certain time are considered to be removed
| before enlarging hash array.

> I'd like to see some evidence that catalog_cache_memory_target has any
> value, vs. just always setting it to zero. I came up with the
> following somewhat artificial example that shows that it might have
> value.
>
> rhaas=# create table foo (a int primary key, b text) partition by hash (a);
> [rhaas pgsql]$ perl -e 'for (0..9999) { print "CREATE TABLE foo$_
> PARTITION OF foo FOR VALUES WITH (MODULUS 10000, REMAINDER $_);\n"; }'
> | psql
>
> First execution of 'select * from foo' in a brand new session takes
> about 1.9 seconds; subsequent executions take about 0.7 seconds. So,
> if catalog_cache_memory_target were set to a high enough value to
> allow all of that stuff to remain in cache, we could possibly save
> about 1.2 seconds coming off the blocks after a long idle period.
> That might be enough to justify having the parameter. But I'm not
> quite sure how high the value would need to be set to actually get the
> benefit in a case like that, or what happens if you set it to a value
> that's not quite high enough.

It is artificial (or acutually wont't be repeatedly executed in a
session) but anyway what can get benefit from
catalog_cache_memory_target would be a kind of extreme.

I think the two parameters are to be tuned in the following
steps.

- If the default setting sutisfies you, leave it alone. (as a
general suggestion)

- If you find your (syscache-sensitive) query are to be executed
with rather longer intervals, say 10-30 minutes, and it gets
slower than shorter intervals, consider increase
catalog_cache_prune_min_age to about the query interval. If you
don't suffer process-bloat, that's fine.

- If you find the process too much "bloat"s and you (intuirively)
suspect the cause is system cache, set it to certain shorter
value, say 1 minutes, and set the catalog_cache_memory_target
to allowable amount of memory for each process. The memory
usage will be stable at (un)certain amount above the target.

Or, if you want determine the setting previously with rather
strict limit, and if the monitoring feature were a part of this
patchset, a user can check how much memory is used for the query.

$ perl -e 'print "set track_catalog_cache_usage_interval = 1000;\n"; for (0..9999) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH (MODULUS 10000, REMAINDER $_);\n"; } print "select sum(size) from pg_stat_syscache";' | psql

sum
---------
7088523

In this case, set catalog_cache_memory_target to 7MB and
catalog_cache_memory_target to '1min'. Since the target doesn't
work strictly (checked only at every resizing time), possibly
you need further tuning.

> that's not quite high enough. I think it might be good to play around
> some more with cases like this, just to get a feeling for how much
> time you can save in exchange for how much memory.

All kind of tuning is something of that kind, I think.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-02-25 06:29:44 Re: Remove Deprecated Exclusive Backup Mode
Previous Message Imai, Yoshikazu 2019-02-25 06:24:11 RE: speeding up planning with partitions