Re: Protect syscache from bloating with negative cache entries

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2016-12-20 23:18:13
Message-ID: CA+TgmoYz3Neau8WiamQD5s8fDQgjv7b+UgDmtjLGshyEap4L4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 20, 2016 at 3:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Dec 20, 2016 at 10:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't understand why we'd make that a system-wide behavior at all,
>>> rather than expecting each process to manage its own cache.
>
>> Individual backends don't have a really great way to do time-based
>> stuff, do they? I mean, yes, there is enable_timeout() and friends,
>> but I think that requires quite a bit of bookkeeping.
>
> If I thought that "every ten minutes" was an ideal way to manage this,
> I might worry about that, but it doesn't really sound promising at all.
> Every so many queries would likely work better, or better yet make it
> self-adaptive depending on how much is in the local syscache.

I don't think "every so many queries" is very promising at all.
First, it has the same problem as a fixed cap on the number of
entries: if you're doing a round-robin just slightly bigger than that
value, performance will be poor. Second, what's really important here
is to keep the percentage of wall-clock time spent populating the
system caches small. If a backend is doing 4000 queries/second and
each of those 4000 queries touches a different table, it really needs
a cache of at least 4000 entries or it will thrash and slow way down.
But if it's doing a query every 10 minutes and those queries
round-robin between 4000 different tables, it doesn't really need a
4000-entry cache. If those queries are long-running, the time to
repopulate the cache will only be a tiny fraction of runtime. If the
queries are short-running, then the effect is, percentage-wise, just
the same as for the high-volume system, but in practice it isn't
likely to be felt as much. I mean, if we keep a bunch of old cache
entries around on a mostly-idle backend, they are going to be pushed
out of CPU caches and maybe even paged out. One can't expect a
backend that is woken up after a long sleep to be quite as snappy as
one that's continuously active.

Which gets to my third point: anything that's based on number of
queries won't do anything to help the case where backends sometimes go
idle and sit there for long periods. Reducing resource utilization in
that case would be beneficial. Ideally I'd like to get rid of not
only the backend-local cache contents but the backend itself, but
that's a much harder project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2016-12-20 23:29:12 Re: pg_authid.rolpassword format (was Re: Password identifiers, protocol aging and SCRAM protocol)
Previous Message Stephen Frost 2016-12-20 23:14:40 Re: pg_authid.rolpassword format (was Re: Password identifiers, protocol aging and SCRAM protocol)