Re: Protect syscache from bloating with negative cache entries

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2018-03-07 20:31:03
Message-ID: CA+TgmoakgYA8=q_pUituJFBC57G1-uLkXa=A-rSkpfKy3cTJsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Mar 7, 2018 at 6:01 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> The thing that comes to mind when reading this patch is that some time
> ago we made fun of other database software, "they are so complicated to
> configure, they have some magical settings that few people understand
> how to set". Postgres was so much better because it was simple to set
> up, no magic crap. But now it becomes apparent that that only was so
> because Postgres sucked, ie., we hadn't yet gotten to the point where we
> *needed* to introduce settings like that. Now we finally are?
>
> I have to admit being a little disappointed about that outcome.

I think your disappointment is a little excessive. I am not convinced
of the need either for this to have any GUCs at all, but if it makes
other people happy to have them, then I think it's worth accepting
that as the price of getting the feature into the tree. These are
scarcely the first GUCs we have that are hard to tune. work_mem is a
terrible knob, and there are probably like very few people who know
how to set ssl_ecdh_curve to anything other than the default, and
what's geqo_selection_bias good for, anyway? I'm not sure what makes
the settings we're adding here any different. Most people will ignore
them, and a few people who really care can change the values.

> I wonder if this is just because we refuse to acknowledge the notion of
> a connection pooler. If we did, and the pooler told us "here, this
> session is being given back to us by the application, we'll keep it
> around until the next app comes along", could we clean the oldest
> inactive cache entries at that point? Currently they use DISCARD for
> that. Though this does nothing to fix hypothetical cache bloat for
> pg_dump in bug #14936.

We could certainly clean the oldest inactive cache entries at that
point, but there's no guarantee that would be the right thing to do.
If the working set across all applications is small enough that you
can keep them all in the caches all the time, then you should do that,
for maximum performance. If not, DISCARD ALL should probably flush
everything that the last application needed and the next application
won't. But without some configuration knob, you have zero way of
knowing how concerned the user is about saving memory in this place
vs. improving performance by reducing catalog scans. Even with such a
knob it's a little difficult to say which things actually ought to be
thrown away.

I think this is a related problem, but a different one. I also think
we ought to have built-in connection pooling. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2018-03-07 20:31:11 Re: csv format for psql
Previous Message Robert Haas 2018-03-07 20:18:50 Re: planner failure with ProjectSet + aggregation + parallel query