RE: Protect syscache from bloating with negative cache entries

From: "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>
To: 'Kyotaro HORIGUCHI' <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "Jim(dot)Nasby(at)bluetreble(dot)com" <Jim(dot)Nasby(at)bluetreble(dot)com>, "craig(at)2ndquadrant(dot)com" <craig(at)2ndquadrant(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: RE: Protect syscache from bloating with negative cache entries
Date: 2018-10-04 04:27:04
Message-ID: 4E72940DA2BF16479384A86D54D0988A6F1BCB6F@G01JPEXMBKW04
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi, thank you for the explanation.

>From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
>>
>> Can I confirm about catcache pruning?
>> syscache_memory_target is the max figure per CatCache.
>> (Any CatCache has the same max value.) So the total max size of
>> catalog caches is estimated around or slightly more than # of SysCache
>> array times syscache_memory_target.
>
>Right.
>
>> If correct, I'm thinking writing down the above estimation to the
>> document would help db administrators with estimation of memory usage.
>> Current description might lead misunderstanding that
>> syscache_memory_target is the total size of catalog cache in my impression.
>
>Honestly I'm not sure that is the right design. Howerver, I don't think providing such
>formula to users helps users, since they don't know exactly how many CatCaches and
>brothres live in their server and it is a soft limit, and finally only few or just one catalogs
>can reach the limit.

Yeah, I agree with that kind of formula is not suited for the document.
But if users don't know how many catcaches and brothers is used at postgres,
then how about changing syscache_memory_target as total soft limit of catcache,
rather than size limit of individual catcache. Internally syscache_memory_target can
be divided by the number of Syscache and does its work. The total amount would be
easier to understand for users who don't know the detailed contents of catalog caches.

Or if user can tell how many/what kind of catcaches exists, for instance by using
the system view you provided in the previous email, the current design looks good to me.

>The current design based on the assumption that we would have only one
>extremely-growable cache in one use case.
>
>> Related to the above I just thought changing sysycache_memory_target
>> per CatCache would make memory usage more efficient.
>
>We could easily have per-cache settings in CatCache, but how do we provide the knobs
>for them? I can guess only too much solutions for that.
Agreed.

>> Though I haven't checked if there's a case that each system catalog
>> cache memory usage varies largely, pg_class cache might need more memory than
>others and others might need less.
>> But it would be difficult for users to check each CatCache memory
>> usage and tune it because right now postgresql hasn't provided a handy way to
>check them.
>
>I supposed that this is used without such a means. Someone suffers syscache bloat
>just can set this GUC to avoid the bloat. End.
Yeah, I took the purpose wrong.

>Apart from that, in the current patch, syscache_memory_target is not exact at all in
>the first place to avoid overhead to count the correct size. The major difference comes
>from the size of cache tuple itself. But I came to think it is too much to omit.
>
>As a *PoC*, in the attached patch (which applies to current master), size of CTups are
>counted as the catcache size.
>
>It also provides pg_catcache_size system view just to give a rough idea of how such
>view looks. I'll consider more on that but do you have any opinion on this?
>
>=# select relid::regclass, indid::regclass, size from pg_syscache_sizes order by size
>desc;
> relid | indid | size
>-------------------------+-------------------------------------------+--
>-------------------------+-------------------------------------------+--
>-------------------------+-------------------------------------------+--
>-------------------------+-------------------------------------------+--
> pg_class | pg_class_oid_index | 131072
> pg_class | pg_class_relname_nsp_index | 131072
> pg_cast | pg_cast_source_target_index | 5504
> pg_operator | pg_operator_oprname_l_r_n_index | 4096
> pg_statistic | pg_statistic_relid_att_inh_index | 2048
> pg_proc | pg_proc_proname_args_nsp_index | 2048
>..

Great! I like this view.
One of the extreme idea would be adding all the members printed by CatCachePrintStats(),
which is only enabled with -DCATCACHE_STATS at this moment.
All of the members seems too much for customers who tries to change the cache limit size
But it may be some of the members are useful because for example cc_hits would indicate that current
cache limit size is too small.

>> Another option is that users only specify the total memory target size
>> and postgres dynamically change each CatCache memory target size according to a
>certain metric.
>> (, which still seems difficult and expensive to develop per benefit)
>> What do you think about this?
>
>Given that few caches bloat at once, it's effect is not so different from the current
>design.
Yes agreed.

>> As you commented here, guc variable syscache_memory_target and
>> syscache_prune_min_age are used for both syscache and relcache (HTAB), right?
>
>Right, just not to add knobs for unclear reasons. Since ...
>
>> Do syscache and relcache have the similar amount of memory usage?
>
>They may be different but would make not so much in the case of cache bloat.
>> If not, I'm thinking that introducing separate guc variable would be fine.
>> So as syscache_prune_min_age.
>
>I implemented that so that it is easily replaceable in case, but I'm not sure separating
>them makes significant difference..
Maybe I was overthinking mixing my development.

Regards,
Takeshi Ideriha

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrien Nayrat 2018-10-04 06:54:57 Re: Skylake-S warning
Previous Message Amit Kapila 2018-10-04 04:15:54 Re: Shouldn't ExecShutdownNode() be called from standard_ExecutorFinish()?