Re: Cache relation sizes?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2020-08-04 02:21:28
Message-ID: CA+hUKG+xDZs2i2Tc17MYXmSG2B0a_xOnSdh2WintxMaHOX-WZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2020 at 3:54 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> So in this thread three solutions are proposed:
> 1. "bullet-proof general shared invalidation"
> 2. recovery-only solution avoiding shared memory and invalidation
> 3. "relaxed" shared memory cache with simplified invalidation

Hi Konstantin,

By the way, point 2 is now committed (c5315f4f). As for 1 vs 3, I
wasn't proposing two different invalidation techniques: in both
approaches, I'm calling the cached values "relaxed", meaning that
their freshness is controlled by memory barriers elsewhere that the
caller has to worry about. I'm just suggesting for idea 3 that it
might be a good idea to use relaxed values only in a couple of hot
code paths where we do the analysis required to convince ourselves
that memory barriers are already in the right places to make it safe.
By "bullet-proof" I meant that we could in theory convince ourselves
that *all* users of smgrnblocks() can safely use relaxed values, but
that's hard.

That said, the sketch patch I posted certainly needs more work, and
maybe someone has a better idea on how to do it.

> If solving such very important by still specific problem of caching
> relation size requires so much efforts,
> then may be it is time to go further in the direction towards shared
> catalog?

I wouldn't say it requires too much effort, at least the conservative
approach (3). But I also agree with what you're saying, in the long
term:

> This shared relation cache can easily store relation size as well.
> In addition it will solve a lot of other problems:
> - noticeable overhead of local relcache warming
> - large memory consumption in case of larger number of relations
> O(max_connections*n_relations)
> - sophisticated invalidation protocol and related performance issues
> Certainly access to shared cache requires extra synchronization.But DDL
> operations are relatively rare.
> So in most cases we will have only shared locks. May be overhead of
> locking will not be too large?

Yeah, I would be very happy if we get a high performance shared
sys/rel/plan/... caches in the future, and separately, having the
relation size available in shmem is something that has come up in
discussions about other topics too (tree-based buffer mapping,
multi-relation data files, ...). I agree with you that our cache
memory usage is a big problem, and it will be great to fix that one
day. I don't think that should stop us from making small improvements
to the existing design in the meantime, though. "The perfect is the
enemy of the good." Look at all this trivial stuff:

https://wiki.postgresql.org/wiki/Syscall_Reduction

I don't have high quality data to report yet, but from simple tests
I'm seeing orders of magnitude fewer syscalls per pgbench transaction
in recovery, when comparing 11 to 14devel, due to various changes.
Admittedly, the size probes in regular backends aren't quite as bad as
the recovery ones were, because they're already limited by the rate
you can throw request/response cycles at the DB, but there are still
some cases like planning for high-partition counts and slow
filesystems that can benefit measurably from caching.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-08-04 03:00:33 SSL TAP test fails due to default client certs.
Previous Message David Rowley 2020-08-04 02:01:03 Making JIT more granular