Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, Jim(dot)Nasby(at)bluetreble(dot)com, craig(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2018-09-13 12:40:59
Message-ID: 20180913.214059.30899759.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello. Thank you for looking this.

At Wed, 12 Sep 2018 05:16:52 +0000, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> wrote in <4E72940DA2BF16479384A86D54D0988A6F197012(at)G01JPEXMBKW04>
> Hi,
>
> >Subject: Re: Protect syscache from bloating with negative cache entries
> >
> >Hello. The previous v4 patchset was just broken.
>
> >Somehow the 0004 was merged into the 0003 and applying 0004 results in failure. I
> >removed 0004 part from the 0003 and rebased and repost it.
>
> I have some questions about syscache and relcache pruning
> though they may be discussed at upper thread or out of point.
>
> Can I confirm about catcache pruning?
> syscache_memory_target is the max figure per CatCache.
> (Any CatCache has the same max value.)
> So the total max size of catalog caches is estimated around or
> slightly more than # of SysCache array times syscache_memory_target.

Right.

> If correct, I'm thinking writing down the above estimation to the document
> would help db administrators with estimation of memory usage.
> Current description might lead misunderstanding that syscache_memory_target
> is the total size of catalog cache in my impression.

Honestly I'm not sure that is the right design. Howerver, I don't
think providing such formula to users helps users, since they
don't know exactly how many CatCaches and brothres live in their
server and it is a soft limit, and finally only few or just one
catalogs can reach the limit.

The current design based on the assumption that we would have
only one extremely-growable cache in one use case.

> Related to the above I just thought changing sysycache_memory_target per CatCache
> would make memory usage more efficient.

We could easily have per-cache settings in CatCache, but how do
we provide the knobs for them? I can guess only too much
solutions for that.

> Though I haven't checked if there's a case that each system catalog cache memory usage varies largely,
> pg_class cache might need more memory than others and others might need less.
> But it would be difficult for users to check each CatCache memory usage and tune it
> because right now postgresql hasn't provided a handy way to check them.

I supposed that this is used without such a means. Someone
suffers syscache bloat just can set this GUC to avoid the
bloat. End.

Apart from that, in the current patch, syscache_memory_target is
not exact at all in the first place to avoid overhead to count
the correct size. The major difference comes from the size of
cache tuple itself. But I came to think it is too much to omit.

As a *PoC*, in the attached patch (which applies to current
master), size of CTups are counted as the catcache size.

It also provides pg_catcache_size system view just to give a
rough idea of how such view looks. I'll consider more on that but
do you have any opinion on this?

=# select relid::regclass, indid::regclass, size from pg_syscache_sizes order by size desc;
relid | indid | size
-------------------------+-------------------------------------------+--------
pg_class | pg_class_oid_index | 131072
pg_class | pg_class_relname_nsp_index | 131072
pg_cast | pg_cast_source_target_index | 5504
pg_operator | pg_operator_oprname_l_r_n_index | 4096
pg_statistic | pg_statistic_relid_att_inh_index | 2048
pg_proc | pg_proc_proname_args_nsp_index | 2048
..

> Another option is that users only specify the total memory target size and postgres
> dynamically change each CatCache memory target size according to a certain metric.
> (, which still seems difficult and expensive to develop per benefit)
> What do you think about this?

Given that few caches bloat at once, it's effect is not so
different from the current design.

> As you commented here, guc variable syscache_memory_target and
> syscache_prune_min_age are used for both syscache and relcache (HTAB), right?

Right, just not to add knobs for unclear reasons. Since ...

> Do syscache and relcache have the similar amount of memory usage?

They may be different but would make not so much in the case of
cache bloat.

> If not, I'm thinking that introducing separate guc variable would be fine.
> So as syscache_prune_min_age.

I implemented that so that it is easily replaceable in case, but
I'm not sure separating them makes significant difference..

Thanks for the opinion, I'll put consideration on this more.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v5-2-PoC-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch text/x-patch 28.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2018-09-13 13:01:13 Re: Index Skip Scan
Previous Message Stephen Frost 2018-09-13 11:45:49 Re: Collation versioning