Re: Drop type "smgr"?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Shawn Debnath <sdn(at)amazon(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Drop type "smgr"?
Date: 2019-03-01 08:11:02
Message-ID: 0ce6628f-d73a-0710-aa7b-3811ffc0f69e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01.03.2019 1:32, Thomas Munro wrote:
> On Fri, Mar 1, 2019 at 10:41 AM Shawn Debnath <sdn(at)amazon(dot)com> wrote:
>> On Fri, Mar 01, 2019 at 10:33:06AM +1300, Thomas Munro wrote:
>>> It doesn't make any sense to put things like clog or any other SLRU in
>>> a non-default tablespace though. It's perfectly OK if not all smgr
>>> implementations know how to deal with tablespaces, and the SLRU
>>> support should just not support that.
>> If the generic storage manager, or whatever we end up calling it, ends
>> up being generic enough - its possible that tablespace value would have
>> to be respected.
> Right, you and I have discussed this a bit off-list, but for the
> benefit of others, I think what you're getting at with "generic
> storage manager" here is something like this: on the one hand, our
> proposed revival of SMGR as a configuration point is about is
> supporting alternative file layouts for bufmgr data, but at the same
> time there is some background noise about direct IO, block encryption,
> ... and who knows what alternative block storage someone might come up
> with ... at the block level. So although it sounds a bit
> contradictory to be saying "let's make all these different SMGRs!" at
> the same time as saying "but we'll eventually need a single generic
> SMGR that is smart enough to be parameterised for all of these
> layouts!", I see why you say it. In fact, the prime motivation for
> putting SLRUs into shared buffers is to get better buffering, because
> (anecdotally) slru.c's mini-buffer scheme performs abysmally without
> the benefit of an OS page cache. If we add optional direct IO support
> (something I really want), we need it to apply to SLRUs, undo and
> relations, ideally without duplicating code, so we'd probably want to
> chop things up differently. At some point I think we'll need to
> separate the questions "how to map blocks to filenames and offsets"
> and "how to actually perform IO". I think the first question would be
> controlled by the SMGR IDs as discussed, but the second question
> probably needs to be controlled by GUCs that control all IO, and/or
> special per relation settings (supposing you can encrypt just one
> table, as a random example I know nothing about); but that seems way
> out of scope for the present projects. IMHO the best path from here
> is to leave md.c totally untouched for now as the SMGR for plain old
> relations, while we work on getting these new kinds of bufmgr data
> into the tree as a first step, and a later hypothetical direct IO or
> whatever project can pay for the refactoring to separate IO from
> layout.
>

I completely agree with this statement:

At some point I think we'll need to separate the questions "how to map blocks to filenames and offsets" and "how to actually perform IO".

There are two subsystems developed in PgPro which are integrated in
Postgres at file IO level: CFS (compressed file system) and SnapFS (fast
database snapshots).
First one provides page level encryption and compression, second -
mechanism for fast restoring of database state.
Both are implemented by patching fd.c. My first idea was to implement
them as alternative storage devices (alternative to md.c). But it will
require duplication of all segment mapping logic from md.c + file
descriptors cache from fd.c. It will be nice if it is possible to
redefine raw file operations (FileWrite, FileRead,...) without affecting
segment mapping logic.

One more thing... From my point of view one of the drawbacks of Postgres
is that it requires underlaying file system and is not able to work with
raw partitions.
It seems to me that bypassing fle system layer can significantly improve
performance and give more possibilities for IO performance tuning.
Certainly it will require a log of changes in Postgres storage layer so
this is not what I suggest to implement or even discuss right now.
But it may be useful to keep it in mind in discussions concerning
"generic storage manager".

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-03-01 08:32:45 Re: Protect syscache from bloating with negative cache entries
Previous Message tushar 2019-03-01 08:03:23 Re: Minimal logical decoding on standbys