Re: Protect syscache from bloating with negative cache entries

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: hlinnaka(at)iki(dot)fi, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2020-11-19 05:25:36
Message-ID: 20201119.142536.1685839779534143847.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the comments.

At Tue, 17 Nov 2020 16:22:54 -0500, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> On Tue, Nov 17, 2020 at 10:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > 0.7% degradation is probably acceptable.
>
> I haven't looked at this patch in a while and I'm pleased with the way
> it seems to have been redesigned. It seems relatively simple and
> unlikely to cause big headaches. I would say that 0.7% is probably not
> acceptable on a general workload, but it seems fine on a benchmark

Sorry for the confusing notation, "-0.7% degradation" meant +0.7%
*gain*, which I thinks is error. However, the next patch makes
catcache apparently *faster* so the difference doesn't matter..

> that is specifically designed to be a worst-case for this patch, which
> I gather is what's happening here. I think it would be nice if we
> could enable this feature by default. Does it cause a measurable
> regression on realistic workloads when enabled? I bet a default of 5
> or 10 minutes would help many users.
>
> One idea for improving things might be to move the "return
> immediately" tests in CatCacheCleanupOldEntries() to the caller, and
> only call this function if they indicate that there is some purpose.
> This would avoid the function call overhead when nothing can be done.
> Perhaps the two tests could be combined into one and simplified. Like,
> suppose the code looks (roughly) like this:
>
> if (catcacheclock >= time_at_which_we_can_prune)
> CatCacheCleanupOldEntries(...);

Compiler removes the call (or inlines the function) but of course we
can write that way and it shows the condition for calling the function
better. The codelet above forgetting consideration on the result of
CatCacheCleanupOldEntries() itself. The function returns false when
all "old" entries have been invalidated or explicitly removed and we
need to expand the hash in that case.

> To make it that simple, we want catcacheclock and
> time_at_which_we_can_prune to be stored as bare uint64 quantities so
> we don't need TimestampDifference(). And we want
> time_at_which_we_can_prune to be set to PG_UINT64_MAX when the feature
> is disabled. But those both seem like pretty achievable things... and
> it seems like the result would probably be faster than what you have
> now.

The time_at_which_we_can_prune is not global but catcache-local and
needs to change at the time catalog_cache_prune_min_age is changed.

So the next version does as the follwoing:

- if (CatCacheCleanupOldEntries(cp))
+ if (catcacheclock - cp->cc_oldest_ts > prune_min_age_us &&
+ CatCacheCleanupOldEntries(cp))

On the other hand CatCacheCleanupOldEntries can calcualte the
time_at_which_we_can_prune once at the beginning of the function. That
makes the condition in the loop simpler.

- TimestampDifference(ct->lastaccess, catcacheclock, &age, &us);
-
- if (age > catalog_cache_prune_min_age)
+ if (ct->lastaccess < prune_threshold)
{

> + * per-statement basis and additionaly udpated periodically
>
> two words spelled wrong

Ugg. Fixed. Checked all spellings and found another misspelling.

> +void
> +assign_catalog_cache_prune_min_age(int newval, void *extra)
> +{
> + catalog_cache_prune_min_age = newval;
> +}
>
> hmm, do we need this?

*That* is actually useless, but the function is kept and not it
maintains the internal-version of the GUC parameter (uint64
prune_min_age).

> + /*
> + * Entries that are not accessed after the last pruning
> + * are removed in that seconds, and their lives are
> + * prolonged according to how many times they are accessed
> + * up to three times of the duration. We don't try shrink
> + * buckets since pruning effectively caps catcache
> + * expansion in the long term.
> + */
> + ct->naccess = Min(2, ct->naccess);
>
> The code doesn't match the comment, it seems, because the limit here
> is 2, not 3. I wonder if this does anything anyway. My intuition is
> that when a catcache entry gets accessed at all it's probably likely
> to get accessed a bunch of times. If there are any meaningful
> thresholds here I'd expect us to be trying to distinguish things like
> 1000+ accesses vs. 100-1000 vs. 10-100 vs. 1-10. Or maybe we don't
> need to distinguish at all and can just have a single mark bit rather
> than a counter.

Agreed. Since I don't see a clear criteria for the threshold of the
counter, I removed the naccess and related lines.

I did the following changes in the attached.

1. Removed naccess and related lines.

2. Moved the precheck condition out of CatCacheCleanupOldEntries() to
RehashCatCache().

3. Use uint64 direct comparison instead of TimestampDifference().

4. Removed CatCTup.dead flag.

Performance measurement on the attached showed better result about
searching but maybe worse for cache entry creation. Each time number
is the mean of 10 runs.

# Cacache (negative) entry creation
: time(ms) (% to master)
master : 3965.61 (100.0)
patched-off: 4040.93 (101.9)
patched-on : 4032.22 (101.7)

# Searching negative cache entries
master : 8173.46 (100.0)
patched-off: 7983.43 ( 97.7)
patched-on : 8049.88 ( 98.5)

# Creation, searching and expiration
master : 6393.23 (100.0)
patched-off: 6527.94 (102.1)
patched-on : 15880.01 (248.4)

That is, catcache searching gets faster by 2-3% but creation gets
slower by about 2%. If I moved the condition of 2 further up to
CatalogCacheCreateEntry(), that degradation reduced to 0.6%.

# Cacache (negative) entry creation
master : 3967.45 (100.0)
patched-off : 3990.43 (100.6)
patched-on : 4108.96 (103.6)

# Searching negative cache entries
master : 8106.53 (100.0)
patched-off : 8036.61 ( 99.1)
patched-on : 8058.18 ( 99.4)

# Creation, searching and expiration
master : 6395.00 (100.0)
patched-off : 6416.57 (100.3)
patched-on : 15830.91 (247.6)

It doesn't get smaller if I reverted the changed lines in
CatalogCacheCreateEntry()..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v5-0001-CatCache-expiration-feature.patch text/x-patch 7.9 KB
v5-0002-Remove-dead-flag-from-catcache-tuple.patch text/x-patch 6.4 KB
v5-0003-catcachebench.patch text/x-patch 11.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2020-11-19 05:28:27 Re: Is postgres ready for 2038?
Previous Message osumi.takamichi@fujitsu.com 2020-11-19 05:24:57 RE: Disable WAL logging to speed up data loading