Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, Jim(dot)Nasby(at)bluetreble(dot)com, craig(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2017-12-19 08:31:38
Message-ID: 20171219.173138.46350691.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 18 Dec 2017 12:14:24 -0500, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaWLBzUasvVs-q=dfBr3pLWSUCQnbqLk-MT7iX4eyrinA(at)mail(dot)gmail(dot)com>
> On Mon, Dec 18, 2017 at 11:46 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I'm not 100% convinced either - but I also don't think it matters all
> > that terribly much. As long as the overall hash hit rate is decent,
> > minor increases in the absolute number of misses don't really matter
> > that much for syscache imo. I'd personally go for something like:
> >
> > 1) When about to resize, check if there's entries of a generation -2
> > around.
> >
> > Don't resize if more than 15% of entries could be freed. Also, stop
> > reclaiming at that threshold, to avoid unnecessary purging cache
> > entries.
> >
> > Using two generations allows a bit more time for cache entries to
> > marked as fresh before resizing next.
> >
> > 2) While resizing increment generation count by one.
> >
> > 3) Once a minute, increment generation count by one.
> >
> >
> > The one thing I'm not quite have a good handle upon is how much, and if
> > any, cache reclamation to do at 3). We don't really want to throw away
> > all the caches just because a connection has been idle for a few
> > minutes, in a connection pool that can happen occasionally. I think I'd
> > for now *not* do any reclamation except at resize boundaries.
>
> My starting inclination was almost the opposite. I think that you
> might be right that a minute or two of idle time isn't sufficient
> reason to flush our local cache, but I'd be inclined to fix that by
> incrementing the generation count every 10 minutes or so rather than
> every minute, and still flush things more then 1 generation old. The
> reason for that is that I think we should ensure that the system
> doesn't sit there idle forever with a giant cache. If it's not using
> those cache entries, I'd rather have it discard them and rebuild the
> cache when it becomes active again.

I see three kinds of syscache entries.

A. An entry for an actually existing object.

This is literally a syscache entry. This kind of entry is not
necessary to be removed but can be removed after ignorance for
a certain period of time.

B. An entry for an object which once existed but no longer.

This can be removed any time after the removal of the object
and is a main cause of stats bloat or relcache bloat which are
the motive of this thread. We can know whether the entries of
this kind are removable using cache invalidation
mechanism. (the patch upthread)

We can queue the oids that specify the entries to remove, then
actually remove at the next resize. (And this also could be
another cause of bloat. So we could forcibly flush a hash when
the oid list becomes longer than some threashold.)

C. An entry for a just non-existent objects.

I'm not sure how we should treat this since the necessity of a
entry of the kind purely stands on whether the entry will be
accessed sometime. But we could put the same assumption to A.

> Now, I also see that your point about trying to clean up before
> resizing. That does seem like a good idea, although we have to be
> careful not to be too eager to clean up there, or we'll just result in
> artificially limiting the cache size when it's unwise to do so. But I
> guess that's what you meant by "Also, stop reclaiming at that
> threshold, to avoid unnecessary purging cache entries." I think the
> idea you are proposing is that:
>
> 1. The first time we are due to expand the hash table, we check
> whether we can forestall that expansion by doing a cleanup; if so, we
> do that instead.
>
> 2. After that, we just expand.
>
> That seems like a fairly good idea, although it might be a better idea
> to allow cleanup if enough time has passed. If we hit the expansion
> threshold twice an hour apart, there's no reason not to try cleanup
> again.

Aa session with intermittently executes queries run in a very
short time could be considered as an example workload where
cleanup with such criteria is unwelcomed. But syscache won't
bloat in the case.

> Generally, the way I'm viewing this is that a syscache entry means
> paying memory to save CPU time. Each 8kB of memory we use to store
> system cache entries is one less block we have for the OS page cache
> to hold onto our data blocks. If we had an oracle (the kind from

Sure

> Delphi, not Redwood City) that told us with perfect accuracy when to
> discard syscache entries, it would throw away syscache entries

Except for the B in the aboves. The logic seems somewhat alien to
the time-based cleanup but this can be the measure for quick
bloat of some syscahces.

> whenever the marginal execution-time performance we could buy from
> another 8kB in the page cache is greater than the marginal
> execution-time performance we could buy from those syscache entries.
> In reality, it's hard to know which of those things is of greater
> value. If the system isn't meaningfully memory-constrained, we ought
> to just always hang onto the syscache entries, as we do today, but
> it's hard to know that. I think the place where this really becomes a
> problem is on system with hundreds of connections + thousands of
> tables + connection pooling; without some back-pressure, every backend
> eventually caches everything, putting the system under severe memory
> pressure for basically no performance gain. Each new use of the
> connection is probably for a limited set of tables, and only those
> tables really syscache entries; holding onto things used long in the
> past doesn't save enough to justify the memory used.

Agreed. The following is the whole image of the measure for
syscache bloat considering "quick bloat". (I still think it is
wanted under some situations.)

1. If a removal of any objects that make some syscache entries
stale (this cannot be checked without scanning whole a hash so
just queue it into, for exameple, recently_removed_relations
OID hash.)

2. If the number of the oid-hash entries reasches 1000 or 10000
(mmm. quite arbitrary..), Immediately clean up syscaches that
accepts/needs removed-reloid cleanup. (The oid hash might be
needed separately for each target cache to avoid readandunt
scan, or to get rid a kind of generation management in the oid
hash.)

3.
> 1. The first time we are due to expand the hash table, we check
> whether we can forestall that expansion by doing a cleanup; if so, we
> do that instead.

And if there's any entry in the removed-reloid hash it is
considered while cleanup.

4.
> 2. After that, we just expand.
>
> That seems like a fairly good idea, although it might be a better idea
> to allow cleanup if enough time has passed. If we hit the expansion
> threshold twice an hour apart, there's no reason not to try cleanup
> again.

1 + 2 and 3 + 4 can be implemented as separate patches and I'll
do the latter first.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-12-19 08:36:48 Re: access/parallel.h lacks PGDLLIMPORT
Previous Message Regina Obe 2017-12-19 08:24:56 MemoryContextCreate change in PG 11 how should contexts be created