Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, Jim(dot)Nasby(at)bluetreble(dot)com, craig(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2018-11-27 10:36:22
Message-ID: 20181127.193622.252197705.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for reviewing.

At Thu, 15 Nov 2018 11:02:10 +0000, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> wrote in <4E72940DA2BF16479384A86D54D0988A6F1F4165(at)G01JPEXMBKW04>
> Hello, thank you for updating the patch.
>
>
> >From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
> >At Thu, 4 Oct 2018 04:27:04 +0000, "Ideriha, Takeshi"
> ><ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> wrote in
> ><4E72940DA2BF16479384A86D54D0988A6F1BCB6F(at)G01JPEXMBKW04>
> >> >As a *PoC*, in the attached patch (which applies to current master),
> >> >size of CTups are counted as the catcache size.
> >> >
> >> >It also provides pg_catcache_size system view just to give a rough
> >> >idea of how such view looks. I'll consider more on that but do you have any opinion
> >on this?
> >> >
> >...
> >> Great! I like this view.
> >> One of the extreme idea would be adding all the members printed by
> >> CatCachePrintStats(), which is only enabled with -DCATCACHE_STATS at this
> >moment.
> >> All of the members seems too much for customers who tries to change
> >> the cache limit size But it may be some of the members are useful
> >> because for example cc_hits would indicate that current cache limit size is too small.
> >
> >The attached introduces four features below. (But the features on relcache and
> >plancache are omitted).
> I haven't looked into the code but I'm going to do it later.
>
> Right now It seems to me that focusing on catalog cache invalidation and its stats a quick route
> to commit this feature.
>
> >1. syscache stats collector (in 0002)
> >
> >Records syscache status consists of the same columns above and "ageclass"
> >information. We could somehow triggering a stats report with signal but we don't want
> >take/send/write the statistics in signal handler. Instead, it is turned on by setting
> >track_syscache_usage_interval to a positive number in milliseconds.
>
> I agreed. Agecalss is important to tweak the prune_min_age.
> Collecting stats is heavy at every stats change
>
> >2. pg_stat_syscache view. (in 0002)
> >
> >This view shows catcache statistics. Statistics is taken only on the backends where
> >syscache tracking is active.
> >
> >> pid | application_name | relname | cache_name
> >| size | ageclass | nentries
> >>
> >------+------------------+----------------+-----------------------------------+----------
> >+-------------------------+---------------------------
> >> 9984 | psql | pg_statistic | pg_statistic_relid_att_inh_index |
> >12676096 | {30,60,600,1200,1800,0} | {17660,17310,55870,0,0,0}
> >
> >Age class is the basis of catcache truncation mechanism and shows the distribution
> >based on elapsed time since last access. As I didn't came up an appropriate way, it is
> >represented as two arrays. Ageclass stores maximum age for each class in seconds.
> >Nentries holds entry numbers correnponding to the same element in ageclass. In the
> >above example,
> >
> > age class : # of entries in the cache
> > up to 30s : 17660
> > up to 60s : 17310
> > up to 600s : 55870
> > up to 1200s : 0
> > up to 1800s : 0
> > more longer : 0
> >
> > The ageclass is {0, 0.05, 0.1, 1, 2, 3}th multiples of cache_prune_min_age on the
> >backend.
>
> I just thought that the pair of ageclass and nentries can be represented as
> json or multi-dimensional array but in virtual they are all same and can be converted each other
> using some functions. So I'm not sure which representaion is better one.

Multi dimentional array in any style sounds reasonable. Maybe
array is preferable in system views as it is a basic type than
JSON. In the attached, it looks like the follows:

=# select * from pg_stat_syscache where ntuples > 100;
-[ RECORD 1 ]--------------------------------------------------
pid | 1817
relname | pg_class
cache_name | pg_class_oid_index
size | 2048
ntuples | 189
searches | 1620
hits | 1431
neg_hits | 0
ageclass | {{30,189},{60,0},{600,0},{1200,0},{1800,0},{0,0}}
last_update | 2018-11-27 19:22:00.74026+09

> >3. non-transactional GUC setting (in 0003)
> >
> >It allows setting GUC variable set by the action GUC_ACTION_NONXACT(the name
> >requires condieration) survive beyond rollback. It is required by remote guc setting to
> >work sanely. Without the feature a remote-set value within a trasction will disappear
> >involved in rollback. The only local interface for the NONXACT action is
> >set_config(name, value, is_local=false, is_nonxact = true). pg_set_backend_guc()
> >below works on this feature.
>
> TBH, I'm not familiar with around this and I may be missing something.
> In order to change the other backend's GUC value,
> is ignoring transactional behevior always necessary? When transaction of GUC setting
> is failed and rollbacked, if the error message is supposeed to be reported I thought
> just trying the transaction again is enough.

The target backend can be running frequent transaction. The
invoker backend cannot know whether the remote change happend
during a transaction and whether the transaction if any is
committed or aborted, no error message sent to invoker backend.
We could wait for the end of a trasaction but that doesn't work
with long transactions.

Maybe we don't need the feature in GUC system but adding another
similar feature doesn't seem reasonable. This would be useful for
some other tracking features.

> >4. pg_set_backend_guc() function.
> >
> >Of course syscache statistics recording consumes significant amount of time so it
> >cannot be turned on usually. On the other hand since this feature is turned on by GUC,
> >it is needed to grab the active client connection to turn on/off the feature(but we
> >cannot). Instead, I provided a means to change GUC variables in another backend.
> >
> >pg_set_backend_guc(pid, name, value) sets the GUC variable "name"
> >on the backend "pid" to "value".
> >
> >
> >
> >With the above tools, we can inspect catcache statistics of seemingly bloated process.
> >
> >A. Find a bloated process pid using ps or something.
> >
> >B. Turn on syscache stats on the process.
> > =# select pg_set_backend_guc(9984, 'track_syscache_usage_interval', '10000');
> >
> >C. Examine the statitics.
> >
> >=# select pid, relname, cache_name, size from pg_stat_syscache order by size desc
> >limit 3;
> > pid | relname | cache_name | size
> >------+--------------+----------------------------------+----------
> > 9984 | pg_statistic | pg_statistic_relid_att_inh_index | 32154112
> > 9984 | pg_cast | pg_cast_source_target_index | 4096
> > 9984 | pg_operator | pg_operator_oprname_l_r_n_index | 4096
> >
> >
> >=# select * from pg_stat_syscache where cache_name =
> >'pg_statistic_relid_att_inh_index'::regclass;
> >-[ RECORD 1 ]---------------------------------
> >pid | 9984
> >relname | pg_statistic
> >cache_name | pg_statistic_relid_att_inh_index
> >size | 11026176
> >ntuples | 77950
> >searches | 77950
> >hits | 0
> >neg_hits | 0
> >ageclass | {30,60,600,1200,1800,0}
> >nentries | {17630,16950,43370,0,0,0}
> >last_update | 2018-10-17 15:58:19.738164+09
>
> The output of this view seems good to me.
>
> I can imagine this use case. Does the use case of setting GUC locally never happen?
> I mean can the setting be locally changed?

Syscahe grows through a life of a backend/session. No other
client cannot connect to it at the same time. So the variable
must be set at the start of a backend using ALTER USER/DATABASE,
or the client itself is obliged to deliberitely turn on the
feature at a convenient time. I suppose that in most use cases
one wants to turn on this feature after he sees another session
is eating memory more and more.

The attached is the rebased version that has multidimentional
ageclass.

Thank you for the comments in the next mail but sorry that I'll
address them later.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v6-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch text/x-patch 15.2 KB
v6-0002-Syscache-usage-tracking-feature.patch text/x-patch 35.0 KB
v6-0003-Remote-GUC-setting-feature-and-non-xact-GUC-config.patch text/x-patch 45.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-11-27 11:18:48 Re: MERGE SQL statement for PG12
Previous Message amul sul 2018-11-27 10:24:43 Re: vacuum and autovacuum - is it good to configure the threshold at TABLE LEVEL?