RE: Global shared meta cache

From: "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>
To: "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, 'Amit Langote' <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, 'Thomas Munro' <thomas(dot)munro(at)gmail(dot)com>
Subject: RE: Global shared meta cache
Date: 2019-06-26 06:23:35
Message-ID: 4E72940DA2BF16479384A86D54D0988A7DB7CF50@G01JPEXMBKW04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, everyone.

>From: Ideriha, Takeshi [mailto:ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com]
>My current thoughts:
>- Each catcache has (maybe partial) HeapTupleHeader
>- put every catcache on shared memory and no local catcache
>- but catcache for aborted tuple is not put on shared memory
>- Hash table exists per kind of CatCache
>- These hash tables exists for each database and shared
> - e.g) there is a hash table for pg_class of a DB

I talked about shared CatCache (SysCache) with Thomas at PGCon and he
suggested using sinval to control cache visibility instead of xid.
Base on this I've changed my design. I'll send some PoC patch in a week
but share my idea beforehand. I'm sorry this email is too long to read
but I'm happy if you have some comments.

Basically I won't make shared catcache as default, make it as option.

Both local and shared memory has hash tables of catcache. A shared hash
entry is catctup itself and a local hash entry is a pointer to the
shared catctup. Actually, local hash entry does not hold a direct pointer
but points to a handle of shared catctup. The handle points to shared
catctup and is located in shared memory. This is intended to avoid
dangling pointer of local hash entry due to eviction of shared catctup
by LRU. ( The detail about LRU will be written in another email because
I'll implement it later.)

* Search and Insert
Current postgres searches (local) hash table and if it's missed, search
the actual catalog (shared buffer and disk) and build the cache; build
the negative cache if not found.

In new architecture, if cache is not found in local hash table, postgres
tries to search shared one before consulting shared buffer. Here is a
detail. To begin with, postgres looks up the pointer in local hash
table. If it's found, it references the pointer and gets catctup. If
not, it searches the shared hash table and gets catctup and insert
its pointer into local hash table if the catctup is found. If it doesn't
exist in shared hash table either, postgres searches actual catalog and
build the cache and in most cases insert it into shared hash table
and its pointer to local one. The exception case is that the cache
is made from uncommitted catalog tuple, which must not be seen from
other process. So an uncommitted cache is built in local memory and
pushed directly into local table but not shared one. Lastly, if there
is no tuple we're looking for, put negative tuple into shared hash table.

* Invalidation and visibility control
Now let's talk about invalidation. Current cache invalidation is based
on local and shared invalidation queue (sinval). When transaction is
committed, sinval msg is queued into shared one. Other processes read and
process sinval msgs at their own timing.

In shared catcache, I follow the current sinval in most parts. But I'll
change the action when sinval msg is queued up and read by a process.
When messages are added to shared queue, identify corresponding shared
caches (matched by hash value) and turn their "obsolete flag" on. When
sinval msg is read by a process, each process deletes the local hash
entries (pointer to handler). Each process can see a shared catctup as
long as its pointer (local entry) is valid. Because sinval msgs are not
processed yet, it's ok to keep seeing the pointer to possibly old
cache. After local entry is invalidated, its local process tries
to search shared hash table to always find a catctup whose obsolete flag
is off. The process can see the right shared cache after invalidation
messages are read because it checks the obsolete flag and also
uncommitted cache never exists in shared memory at all.

There is a subtle thing here. Always finding a shared catctup without
obsolete mark assumes that the process already read the sinval msgs. So
before trying to search shared table, I make the process read sinval msg.
After it's read, local cache status becomes consistent with the action
to get a new cache. This reading timing is almost same as current postgres
behavior because it's happened after local cache miss both in current
design and mine. After cache miss in current design, a process opens
the relation and gets a heavyweight lock. At this time, in fact, it reads
the sinval msgs. (These things are well summarized in talking by Robert
Haas at PGCon[1]).

Lastly, we need to invalidate a shared catctup itself at some point. But
we cannot delete is as long as someone sees it. So I'll introduce
refcounter. It's increased or decreased at the same timing when
current postgres manipulates the local refcounter of catctup and catclist
to avoid catctup is deleted while catclist is used or vice versa (that
is SearchCatCache/RelaseCatCache). So shared catctup is deleted when
its shared refcount becomes zero and obsolete flag is on. Once it's
vanished from shared cache, the obsolete cache never comes back again
because a process which tries to get cache but fails in shared hash table
already read the sinval messages (in any case it reads them when opening
a table and taking a lock).

I'll make a PoC aside from performance issue at first and use
SharedMemoryContext (ShmContext) [2], which I'm making to allocate/free
shared items via palloc/pfree.

[1] https://www.pgcon.org/2019/schedule/attachments/548_Challenges%20of%20Concurrent%20DDL.pdf
[2] https://commitfest.postgresql.org/23/2166/

---
Regards,
Takeshi Ideriha

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2019-06-26 06:32:29 Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Previous Message Michael Paquier 2019-06-26 04:23:08 Re: [patch]socket_timeout in interfaces/libpq