RE: Global shared meta cache

From: "ideriha(dot)takeshi(at)fujitsu(dot)com" <ideriha(dot)takeshi(at)fujitsu(dot)com>
To: 'Konstantin Knizhnik' <k(dot)knizhnik(at)postgrespro(dot)ru>, 'Amit Langote' <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, 'Thomas Munro' <thomas(dot)munro(at)gmail(dot)com>
Subject: RE: Global shared meta cache
Date: 2019-10-08 08:36:10
Message-ID: OSAPR01MB1985B69233D6F746B48F69EDEA9A0@OSAPR01MB1985.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Konstantin

I'm very sorry for the late response and thank you for your feedback.
(I re-sent this email because my email address changed and couldn't deliver to hackers.)

>From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
>
>Takeshi-san,
>
>I am sorry for late response - I just waited new version of the patch
>from you for review.

Though I haven't incorporated your idea, I made PoC patch, which supports regular create table, select, and drop table.

TBH, current patch is not sophisticated so much.
It failed some installcheck items with global catalog cache on and has around 2k LOC.

>I read your last proposal and it seems to be very reasonable.
> From my point of view we can not reach acceptable level of performance
>if we do not have local cache at all.
>So, as you proposed, we should maintain local cache for uncommitted data.
Yeah, I did this in my patch.

>I think that size of global cache should be limited (you have introduced GUC for it).
>In principle it is possible to use dynamic shared memory and have
>unlimited global cache.
>But I do not see much sense in it.
Yes. I limit the size for global cache. Right now it doesn't support eviction policy like LRU.

>I do not completely understand from your description when are are going
>to evict entry from local cache?
>Just once transaction is committed? I think it will be more efficient
>to also specify memory threshold for local cache size and use LRU or
>some other eviction policy to remove data from local cache.
>So if working set (accessed relations) fits in local cache limit, there
>will be no performance penalty comparing with current implementation.
>There should be completely on difference on pgbench or other benchmarks
>with relatively small number of relations.
>
>If entry is not found in local cache, then we should look for it in
>global cache and in case of double cache miss - read it from the disk.
>I do not completely understand why we need to store references to
>global cache entries in local cache and use reference counters for global cache entries.
>Why we can not maintain just two independent caches?
>
>While there are really databases with hundreds and even thousands of
>tables, application is still used to work with only some small subset of them.
>So I think that "working set" can still fit in memory. This is why I
>think that in case of local cache miss and global cache hit, we should
>copy data from global cache to local cache to make it possible to access it in future without any sycnhronization.
>
>As far as we need to keep all uncommitted data in local cache, there is
>still a chance of local memory overflow (if some transaction creates or
>alters too much number of tables).
>But I think that it is very exotic and rare use case. The problem with
>memory overflow usually takes place if we have large number of
>backends, each maintaining its own catalog cache.
>So I think that we should have "soft" limit for local cache and "hard"
>limit for global cache.

Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and global cache is second cache. That sounds great.
It would be good for performance and also setting two guc parameter for limiting local cache and global cache gives complete memory control for DBA.
Yeah, uncommitted data should be in local but it's the only exception.
No need to keep track of reference to global cache from local cache header seems less complex for implementation. I'll look into the design.

>I didn't think much about cache invalidation. I read your proposal, but
>frankly speaking do not understand why it should be so complicated.
>Why we can't immediately invalidate entry in global cache and lazily
>(as it is done now using invalidation signals) invalidate local caches?
>

I was overthinking about when local/global cache is evicted. Simply the process reads the sinval messages then invalidate it. If the refcount is not zero, the process mark it dead to prevent other process from finding the obsoleted cache from global hash table.
The refcount of global cache is raised between SearchSysCache() and ReleaseSysCache().
Invalidation of global cache with refcount up would cause invalid memory access.

Regards,
Takeshi Ideriha

Attachment Content-Type Size
0002-POC-global-catalog-cache.patch application/octet-stream 85.6 KB
0001-MemoryContext-for-shared-memory-based-on-DSA.patch application/octet-stream 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-10-08 08:42:36 Re: dropping column prevented due to inherited index
Previous Message Craig Ringer 2019-10-08 07:59:50 Re: Regarding extension