RE: Global shared meta cache

From: "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>
To: 'Amit Langote' <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Global shared meta cache
Date: 2019-04-19 06:43:44
Message-ID: 4E72940DA2BF16479384A86D54D0988A7DB2CABC@G01JPEXMBKW04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>From: Ideriha, Takeshi [mailto:ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com]
>Do you have any thoughts?
>
Hi, I updated my idea, hoping get some feedback.

[TL; DR]
The basic idea is following 4 points:
A. User can choose which database to put a cache (relation and catalog) on shared memory and how much memory is used
B. Caches of committed data are on the shared memory. Caches of uncommitted data are on the local memory.
C. Caches on the shared memory have xid information (xmin, xmax)
D. Evict not recently used cache from shared memory

[A]
Regarding point A, I can imagine some databases are connected by lots of clients but others don't.
So I introduced a new parameter in postgresql.conf, "shared_meta_cache",
which is disabled by default and needs server restart to enable.
ex. shared_meta_cache = 'db1:500MB, db2:100MB'.

Some catcaches like pg_database are shared among the whole database,
so such shared catcaches are allocated in a dedicated space within shared memory.
This space can be controlled by "shared_meta_global_catcache" parameter, which is named after global directory.
But I want this parameter to be hidden in postgresql.conf to make it simple for users. It's too detailed.

[B & C]
Regarding B & C, the motivation is we don't want other backends to see uncommitted tables.
Search order is local memory -> shared memory -> disk.
Local process searches cache in shared memory based from its own snapshot and xid of cache.
When cache is not found in shared memory, cache with xmin is made in shared memory ( but not in local one).

When cache definition is changed by DDL, new cache is created in local one, and thus next commands refer to local cache if needed.
When it's committed, local cache is cleared and shared cache is updated. This update is done by adding xmax to old cache
and also make a new one with xmin. The idea behind adding a new one is that newly created cache (new table or altered table)
is likely to be used in next transactions. At this point maybe we can make use of current invalidation mechanism,
even though invalidation message to other backends is not sent.

[D]
As for D, I'm thinking to do benchmark with simple LRU. If the performance is bad, change to other algorithm like Clock.
We don't care about eviction of local cache because its lifetime is in a transaction, and I don't want to make it bloat.

best regards,
Takeshi Ideriha

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-04-19 06:57:46 Re: "make installcheck" fails in src/test/recovery
Previous Message Pavel Stehule 2019-04-19 06:32:25 Re: [HACKERS] proposal: schema variables