Re: Protect syscache from bloating with negative cache entries

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, craig(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2017-01-22 02:54:44
Message-ID: 19334.1485053684@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> writes:
> The other (possibly naive) question I have is how useful negative
> entries really are? Will Postgres regularly incur negative lookups, or
> will these only happen due to user activity?

It varies depending on the particular syscache, but in at least some
of them, negative cache entries are critical for performance.
See for example RelnameGetRelid(), which basically does a RELNAMENSP
cache lookup for each schema down the search path until it finds a
match. For any user table name with the standard search_path, there's
a guaranteed failure in pg_catalog before you can hope to find a match.
If we don't have negative cache entries, then *every invocation of this
function has to go to disk* (or at least to shared buffers).

It's possible that we could revise all our lookup patterns to avoid this
sort of thing. But I don't have much faith in that always being possible,
and exactly none that we won't introduce new lookup patterns that need it
in future. I spent some time, for instance, wondering if RelnameGetRelid
could use a SearchSysCacheList lookup instead, doing the lookup on table
name only and then inspecting the whole list to see which entry is
frontmost according to the current search path. But that has performance
failure modes of its own, for example if you have identical table names in
a boatload of different schemas. We do it that way for some other cases
such as function lookups, but I think it's much less likely that people
have identical function names in N schemas than that they have identical
table names in N schemas.

If you want to poke into this for particular test scenarios, building with
CATCACHE_STATS defined will yield a bunch of numbers dumped to the
postmaster log at each backend exit.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-01-22 07:04:27 Re: new autovacuum criterion for visible pages
Previous Message Thomas Munro 2017-01-22 00:49:02 Re: remote_apply for logical replication?