Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: bruce(at)momjian(dot)us
Cc: tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, GavinFlower(at)archidevsys(dot)co(dot)nz, robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, alvherre(at)alvh(dot)no-ip(dot)org, andres(at)anarazel(dot)de, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, Jim(dot)Nasby(at)bluetreble(dot)com, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2019-01-24 09:39:24
Message-ID: 20190124.183924.13894464.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the comments.

At Wed, 23 Jan 2019 18:21:45 -0500, Bruce Momjian <bruce(at)momjian(dot)us> wrote in <20190123232145(dot)GA8334(at)momjian(dot)us>
> On Wed, Jan 23, 2019 at 05:35:02PM +0900, Kyotaro HORIGUCHI wrote:
> > At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190121(dot)172255(dot)226467552(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > An option is an additional PGPROC member and interface functions.
> > >
> > > struct PGPROC
> > > {
> > > ...
> > > int syscahe_usage_track_interval; /* track interval, 0 to disable */
> > >
> > > =# select syscahce_usage_track_add(<pid>, <intvl>[, <repetition>]);
> > > =# select syscahce_usage_track_remove(2134);
> > >
> > >
> > > Or, just provide an one-shot triggering function.
> > >
> > > =# select syscahce_take_usage_track(<pid>);
> > >
> > > This can use both a similar PGPROC variable or SendProcSignal()
> > > but the former doesn't fire while idle time unless using timer.
> >
> > The attached is revised version of this patchset, where the third
> > patch is the remote setting feature. It uses static shared memory.
> >
> > =# select pg_backend_catcache_stats(<pid>, <millis>);
> >
> > Activates or changes catcache stats feature on the backend with
> > PID. (The name should be changed to .._syscache_stats, though.)
> > It is far smaller than the remote-GUC feature. (It contains a
> > part that should be in the previous patch. I will fix it later.)
>
> I have a few questions to make sure we have not made the API too
> complex. First, for syscache_prune_min_age, that is the minimum age
> that we prune, and entries could last twice that long. Is there any
> value to doing the scan at 50% of the age so that the
> syscache_prune_min_age is the max age? For example, if our age cutoff
> is 10 minutes, we could scan every 5 minutes so 10 minutes would be the
> maximum age kept.

(Looking into the patch..) Actually thrice, not twice. It is
because I put significance on the access frequency. I think it is
reasonable that the entries with more frequent access gets longer
life (within a certain limit). The original problem here was
negative caches that are created but never accessed. However,
there's no firm reason for the number of the steps (3). There
might be no difference if the extra life time were up to once of
s_p_m_age or even with no extra time.

> Second, when would you use syscache_memory_target != 0?

It is a suggestion upthread, we sometimes want to keep some known
amount of caches despite that expration should be activated.

> If you had
> syscache_prune_min_age really fast, e.g. 10 seconds? What is the
> use-case for this? You have a query that touches 10k objects, and then
> the connection stays active but doesn't touch many of those 10k objects,
> and you want it cleaned up in seconds instead of minutes? (I can't see
> why you would not clean up all unreferenced objects after _minutes_ of
> disuse, but removing them after seconds of disuse seems undesirable.)
> What are the odds you would retain the entires you want with a fast
> target?

Do you asking the reason for the unit? It's just because it won't
be so large even in seconds, to the utmost 3600 seconds. Even
though I don't think such a short dutaion setting is meaningful
in the real world, either I don't think we need to inhibit
that. (Actually it is useful for testing:p) Another reason is
that GUC_UNIT_MIN doesn't seem so common that it is used only by
two variables, log_rotation_age and old_snapshot_threshold.

> What is the value of being able to change a specific backend's stat
> interval? I don't remember any other setting having this ability.

As mentioned upthread, it takes significant time to take
statistics so I believe no one is willing to turn it on at all
times. As the result it should be useless because it cannot be
turned on on an active backend when it actually gets bloat. So I
wanted to provide a remote switching feture.

I also thought that there's some other features that is useful if
it could be turned on remotely so the remote GUC feature but it
was too complex...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-01-24 09:57:08 Re: FETCH FIRST clause PERCENT option
Previous Message Dilip Kumar 2019-01-24 09:01:54 Re: Undo worker and transaction rollback