Re: Cache relation sizes?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2020-11-16 22:42:53
Message-ID: CA+hUKGLfYgBz7_w=oOMDYvReuNr1u-xQdizTbTiHpVKwbAyvOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 16, 2020 at 11:01 PM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> I will look at your implementation more precisely latter.

Thanks! Warning: I thought about making a thing like this for a
while, but the patch itself is only a one-day prototype, so I am sure
you can find many bugs... Hmm, I guess the smgrnblocks_fast() code
really needs a nice big comment to explain the theory about why this
value is fresh enough, based on earlier ideas about implicit memory
barriers. (Something like: if you're extending, you must have
acquired the extension lock; if you're scanning you must have acquired
a snapshot that only needs to see blocks that existed at that time and
concurrent truncations are impossible; I'm wondering about special
snapshots like VACUUM, and whether this is too relaxed for that, or if
it's covered by existing horizon-based interlocking...).

> But now I just to ask one question: is it reasonable to spent
> substantial amount of efforts trying to solve
> the problem of redundant calculations of relation size, which seems to
> be just part of more general problem: shared relation cache?

It's a good question. I don't know exactly how to make a shared
relation cache (I have only some vague ideas and read some of
Idehira-san's work) but I agree that it would be very nice. However,
I think that's an independent and higher level thing, because Relation
!= SMgrRelation.

What is an SMgrRelation? Not much! It holds file descriptors, which
are different in every backend so you can't share them, and then some
information about sizes and a target block for insertions. With this
patch, the sizes become shared and more reliable, so there's *almost*
nothing left at all except for the file descriptors and the pointer to
the shared object. I haven't looked into smgr_targblock, the last
remaining non-shared thing, but there might be an opportunity to do
something clever for parallel inserters (the proposed feature) by
coordinating target blocks in SMgrSharedRelation; I don't know.

> It is well known problem: private caches of system catalog can cause
> unacceptable memory consumption for large number of backends.

Yeah.

> Also private caches requires sophisticated and error prone invalidation
> mechanism.
>
> If we will have such shared cache, then keeping shared relation size
> seems to be trivial task, isn't it?
> So may be we before "diving" into the problem of maintaining shared pool
> of dirtied "inodes" we can better discuss and conerntrate on solving
> more generic problem?

Sure, we can talk about how to make a shared relation cache (and
syscache etc). I think it'll be much harder than this shared SMGR
concept. Even if we figure out how to share palloc'd trees of nodes
with correct invalidation and interlocking (ie like Ideriha-san's work
in [1]), including good performance and space management/replacement,
I guess we'll still need per-backend Relation objects to point to the
per-backend SMgrRelation objects to hold the per-backend file
descriptors. (Until we have threads...). But... do you think sharing
SMGR relations to the maximum extent possible conflicts with that?
Are you thinking there should be some general component here, perhaps
a generic system for pools of shmem objects with mapping tables and a
replacement policy?

[1] https://www.postgresql.org/message-id/flat/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-11-16 22:52:48 Re: VACUUM (DISABLE_PAGE_SKIPPING on)
Previous Message luis.roberto 2020-11-16 22:33:07 Re: enable_incremental_sort changes query behavior