Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: andres(at)anarazel(dot)de
Cc: robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, Jim(dot)Nasby(at)bluetreble(dot)com, craig(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2018-03-07 08:25:45
Message-ID: 20180307.172545.231972323.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oops! The previous ptach contained garbage printing in debugging
output.

The attached is the new version without the garbage. Addition to
it, I changed my mind to use DEBUG1 for the debug message since
the frequency is quite low.

No changes in the following cited previous mail.

At Wed, 07 Mar 2018 16:19:23 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180307(dot)161923(dot)178158050(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
==================
Hello.

Thank you for the discussion, and sorry for being late to come.

At Thu, 1 Mar 2018 12:26:30 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in <20180301202630(dot)2s6untij2x5hpksn(at)alap3(dot)anarazel(dot)de>
> Hi,
>
> On 2018-03-01 15:19:26 -0500, Robert Haas wrote:
> > On Thu, Mar 1, 2018 at 3:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2018-03-01 14:49:26 -0500, Robert Haas wrote:
> > >> On Thu, Mar 1, 2018 at 2:29 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >> > Right. Which might be very painful latency wise. And with poolers it's
> > >> > pretty easy to get into situations like that, without the app
> > >> > influencing it.
> > >>
> > >> Really? I'm not sure I believe that. You're talking perhaps a few
> > >> milliseconds - maybe less - of additional latency on a connection
> > >> that's been idle for many minutes.
> > >
> > > I've seen latency increases in second+ ranges due to empty cat/sys/rel
> > > caches.
> >
> > How is that even possible unless the system is grossly overloaded?
>
> You just need to have catalog contents out of cache and statements
> touching a few relations, functions, etc. Indexscan + heap fetch
> latencies do add up quite quickly if done sequentially.
>
>
> > > I don't think that'd quite address my concern. I just don't think that
> > > the granularity (drop all entries older than xxx sec at the next resize)
> > > is right. For one I don't want to drop stuff if the cache size isn't a
> > > problem for the current memory budget. For another, I'm not convinced
> > > that dropping entries from the current "generation" at resize won't end
> > > up throwing away too much.
> >
> > I think that a fixed memory budget for the syscache is an idea that
> > was tried many years ago and basically failed, because it's very easy
> > to end up with terrible eviction patterns -- e.g. if you are accessing
> > 11 relations in round-robin fashion with a 10-relation cache, your
> > cache nets you a 0% hit rate but takes a lot more maintenance than
> > having no cache at all. The time-based approach lets the cache grow
> > with no fixed upper limit without allowing unused entries to stick
> > around forever.
>
> I definitely think we want a time based component to this, I just want
> to not prune at all if we're below a certain size.
>
>
> > > If we'd a guc 'syscache_memory_target' and we'd only start pruning if
> > > above it, I'd be much happier.
> >
> > It does seem reasonable to skip pruning altogether if the cache is
> > below some threshold size.
>
> Cool. There might be some issues making that check performant enough,
> but I don't have a good intuition on it.

So..

- Now it gets two new GUC variables named syscache_prune_min_age
and syscache_memory_target. The former is the replacement of
the previous magic number 600 and defaults to the same
number. The latter prevens syscache pruning until exceeding the
size and defaults to 0, means that pruning is always
considered. Documentation for the two variables are also
added.

- Revised the pointed mysterious comment for
CatcacheCleanupOldEntries and some comments are added.

- Fixed the name of the variables for CATCACHE_STATS to be more
descriptive, and added some comments for the code.

The catcache entries accessed within the current transaction
won't be pruned so theoretically a long transaction can bloat
catcache. But I believe it is quite rare, or at least this saves
the most other cases.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v3-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch text/x-patch 13.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Zakirov 2018-03-07 08:55:54 Re: [PROPOSAL] Shared Ispell dictionaries
Previous Message Tsunakawa, Takayuki 2018-03-07 08:23:50 RE: Temporary tables prevent autovacuum, leading to XID wraparound