RE: Protect syscache from bloating with negative cache entries

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: 'Kyotaro HORIGUCHI' <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "Jim(dot)Nasby(at)bluetreble(dot)com" <Jim(dot)Nasby(at)bluetreble(dot)com>, "craig(at)2ndquadrant(dot)com" <craig(at)2ndquadrant(dot)com>
Subject: RE: Protect syscache from bloating with negative cache entries
Date: 2019-01-16 06:41:58
Message-ID: 0A3221C70F24FB45833433255569204D1FB684A6@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Certainly, what I've done here doesn't preclude adding some wider solution
> to
> the issue of extremely large catcaches.

I'm relieved to hear that.

> I think it takes the pressure off
> for one rather narrow problem case, and the mechanism could be used to fix
> other ones. But if you've got an application that just plain accesses a
> huge number of objects, this isn't going to make your life better.

I understand you're trying to solve the problem caused by negative cache entries as soon as possible, because the user is really suffering from it. I feel sympathy with that attitude, because you seem to be always addressing issues that others are reluctant to take. That's one of the reasons I respect you.

> Well, we *had* an LRU mechanism for the catcaches way back when. We got
> rid of it --- see commit 8b9bc234a --- because (a) maintaining the LRU
> info was expensive and (b) performance fell off a cliff in scenarios where
> the cache size limit was exceeded. You could probably find some more info
> about that by scanning the mail list archives from around the time of that
> commit, but I'm too lazy to do so right now.

Oh, in 2006... I'll examine the patch and the discussion to see how the LRU management was done.

> That was a dozen years ago, and it's possible that machine performance
> has moved so much since then that the problems are gone or mitigated.

I really, really hope so. Even if we see some visible impact by the LRU management, I think that's the debt PostgreSQL had to pay for but doesn't now. Even the single-process MySQL, which doesn't suffer from cache bloat for many server processes, has the ability to limit the cache. And PostgreSQL has many parameters for various memory components such as shared_buffers, wal_buffers, work_mem, etc, so it would be reasonable to also have the limit for the catalog caches. That said, we can avoid the penalty and retain the current performance by disabling the limit (some_size_param = 0).

I think we'll evaluate the impact of LRU management by adding prev and next members to catcache and relcache structures, and putting the entry at the front (or back) of the LRU chain every time the entry is obtained. I think pgbench's select-only mode is enough for evaluation. I'd like to hear if any other workload is more appropriate to see the CPU cache effect.

> In particular I'm sure that any limit we would want to impose today will
> be far more than the 5000-entries-across-all-caches limit that was in use
> back then. But I'm not convinced that a workload that would create 100K
> cache entries in the first place wouldn't have severe problems if you
> tried to constrain it to use only 80K entries. I fear it's just wishful
> thinking to imagine that the behavior of a larger cache won't be just
> like a smaller one. Also, IIRC some of the problem with the LRU code
> was that it resulted in lots of touches of unrelated data, leading to
> CPU cache miss problems. It's hard to see how that doesn't get even
> worse with a bigger cache.
> As far as the relcache goes, we've never had a limit on that, but there
> are enough routine causes of relcache flushes --- autovacuum for instance
> --- that I'm not really convinced relcache bloat can be a big problem in
> production.

As Andres and Robert mentioned, we want to free less frequently used cache entries. Otherwise, we're now suffering from the bloat to TBs of memory. This is a real, not hypothetical issue...

> The plancache has never had a limit either, which is a design choice that
> was strongly influenced by our experience with catcaches. Again, I'm
> concerned about the costs of adding a management layer, and the likelihood
> that cache flushes will simply remove entries we'll soon have to rebuild.

Fortunately, we're not bothered with the plan cache. But I remember you said you were annoyed by PL/pgSQL's plan cache use at Salesforce. Were you able to overcome it somehow?

> Oracle seldom impresses me as having designs we ought to follow.
> They have a well-earned reputation for requiring a lot of expertise to
> operate, which is not the direction this project should be going in.
> In particular, I don't want to "solve" cache size issues by exposing
> a bunch of knobs that most users won't know how to twiddle.

Oracle certainly seems to be difficult to use. But they seem to be studying other DBMSs to make it simpler to use. I'm sure they also have a lot we should learn, and the cache limit if one of them (although MySQL's per-cache tuning may be better.)

And having limits for various components would be the first step toward the autonomous database; tunable -> auto tuning -> autonomous

Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-01-16 06:54:54 Re: de-deduplicate code in DML execution hooks in postgres_fdw
Previous Message Amit Langote 2019-01-16 06:22:26 Re: speeding up planning with partitions