Re: Extensible storage manager API - SMGR hook Redux

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extensible storage manager API - SMGR hook Redux
Date: 2023-12-04 21:30:36
Message-ID: CAEze2WhmGiwHCunAAwn66RKBpTnTrJ+LewoYSGHmtJKq_ZvmeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 4 Dec 2023 at 22:03, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
> On Mon, 4 Dec 2023 at 22:21, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:
>>
>> On Mon, 4 Dec 2023 at 17:51, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>> >
>> > So, 0002 patch uses the `get_tablespace` function, which searches Catalog to tablespace SMGR id. I wonder how `smgr_redo` would work with it?
>>
>> That's a very good point I hadn't considered in detail yet. Quite
>> clearly, the current code is wrong in assuming that the catalog is
>> accessible, and it should probably be stored in a way similar to
>> pg_filenode.map in a file managed outside the buffer pool.
>>
> Hmm, pg_filenode.map is a nice idea. So, simply maintain TableSpaceOId -> smgr id mapping in a separate file and update the whole file on any changes, right?
> Looks reasonable to me, but it is clear that this solution can be really slow in some patterns, like if we create many-many tablespaces(the way you suggested it in the per-relation SMGR feature). Maybe we can store data in files somehow separately, and only update one chunk per operation.

Yes, but that's a later issue... I'm not sure many-many tablespaces is
actually a good thing. There are already very few reasons to store
tables in more than just the default tablespace. For temporary
relations, there is indeed a guc to automatically put them into one
tablespace; and I can see a similar thing being useful for temporary
relations, too. Then there I can see high-performant local disks vs
lower-performant (but cheaper) local disks also as something
reasonable. But that only gets us to ~6 tablespaces, assuming separate
tablespaces for each combination of (normal, temp, unlogged) * (fast,
cheap). I'm not sure there are many other reasons to add tablespaces,
let alone making one for each table.

Note that you can select which tablespace a table is stored in, so I
see very little reason to actually do something about large numbers of
tablespaces being prohibitively expensive performance-wise.

Why do you want to have a whole new storage configuration for each of
your relations?

> Anyway, if we use a `pg_filenode.map` - like solution, we need to reuse its code infrasture, right? For example, it seems that code that calculates checksums can be reused.
> So, we need to refactor code here, define something like FileMap API maybe. Or is it not really worth it? We can just write similar code twice.

I'm not sure about that. I really doubt we'll need things that are
that similar: right now, the tablespace->smgr mapping could be
considered to be implied by the symlinks in /pg_tblspc/. Non-MD
tablespaces could add a file <oid>.tblspc that detail their
configuration, which would also fix the issue of spcoid->smgr mapping.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-12-04 22:22:00 Re: Failure with pgbench and --disable-thread-safety in ~v16
Previous Message Nathan Bossart 2023-12-04 21:08:57 Re: optimize atomic exchanges