Re: Protect syscache from bloating with negative cache entries

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2021-01-28 07:50:44
Message-ID: 20210128.165044.1288517296648402194.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 27 Jan 2021 13:11:55 +0200, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in
> On 27/01/2021 03:13, Kyotaro Horiguchi wrote:
> > At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> >> The commit 4656e3d668 (debug_invalidate_system_caches_always)
> >> conflicted with this patch. Rebased.
> > At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> >> (I found a bug in a benchmark-aid function
> >> (CatalogCacheFlushCatalog2), I repost an updated version soon.)
> > I noticed that a catcachebench-aid function
> > CatalogCacheFlushCatalog2() allocates bucked array wrongly in the
> > current memory context, which leads to a crash.
> > This is a fixed it then rebased version.
>
> Thanks, with the scripts you provided, I was able to run the
> performance tests on my laptop, and got very similar results as you
> did.
>
> The impact of v7-0002-Remove-dead-flag-from-catcache-tuple.patch is
> very small. I think I could see it in the tests, but only barely. And
> the tests did nothing else than do syscache lookups; in any real world
> scenario, it would be lost in noise. I think we can put that aside for
> now, and focus on v6-0001-CatCache-expiration-feature.patch:

I agree to that opinion. But a bit dissapointing that the long
struggle ended up in vain:p

> The pruning is still pretty lethargic:
>
> - Entries created in the same transactions are never pruned away
>
> - The size of the hash table is never shrunk. So even though the patch
> - puts a backstop to the hash table growing indefinitely, if you run one
> - transaction that bloats the cache, it's bloated for the rest of the
> - session.

Right. But more frequent check impacts on performance. We can do more
aggressive pruning in idle-time.

> I think that's OK. We might want to be more aggressive in the future,
> but for now it seems reasonable to lean towards the current behavior
> where nothing is pruned. Although I wonder if we should try to set
> 'catcacheclock' more aggressively. I think we could set it whenever
> GetCurrentTimestamp() is called, for example.

Ah. I didn't thought that direction. global_last_acquired_timestamp or
such?

> Given how unaggressive this mechanism is, I think it should be safe to
> enable it by default. What would be a suitable default for
> catalog_cache_prune_min_age? 30 seconds?

Without a detailed thought, it seems a bit too short. The
ever-suggested value for the variable is 300-600s. That is,
intermittent queries with about 5-10 minutes intervals don't lose
cache entries.

In a bad case, two queries alternately remove each other's cache
entries.

Q1: adds 100 entries
<1 minute passed>

Q2: adds 100 entries but rehash is going to happen at 150 entries and
the existing 100 entreis added by Q1 are removed.
<1 minute passed>

Q1: adds 100 entries but rehash is going to happen at 150 entries and
the existing 100 entreis added by Q2 are removed.

<repeats>

Or a transaction sequence persists longer than that seconds may lose
some of the catcache entries.

> Documentation needs to be updated for the new GUC.
>
> Attached is a version with a few little cleanups:
> - use TimestampTz instead of uint64 for the timestamps
> - remove assign_catalog_cache_prune_min_age(). All it did was convert
> - the GUC's value from seconds to microseconds, and stored it in a
> - separate variable. Multiplication is cheap, so we can just do it when
> - we use the GUC's value instead.

Yeah, the laater is a trace of the struggle for cutting down cpu
cycles in the normal paths. I don't object to do so.

I found that some comments are apparently stale. cp->cc_oldest_ts is
not used anywhere, but it is added for the decision of whether to scan
or not.

I fixed the following points in the attached.

- Removed some comments that is obvious. ("Timestamp in us")
- Added cp->cc_oldest_ts check in CatCacheCleanupOldEntries.
- Set the default value for catalog_cache_prune_min_age to 600s.
- Added a doc entry for the new GUC in the resoruce/memory section.
- Fix some code comments.
- Adjust pruning criteria from (ct->lastaccess < prune_threshold) to <=.

I was going to write in the doc something like "you can inspect memory
consumption by catalog caches using pg_backend_memory_contexts", but
all the memory used by catalog cache is in CacheMemoryContext. Is it
sensible for each catalog cache to have their own contexts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v9-0001-CatCache-expiration-feature.patch text/x-patch 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-01-28 08:16:52 Re: Protect syscache from bloating with negative cache entries
Previous Message Michael Paquier 2021-01-28 07:36:39 Re: [PATCH] remove pg_standby