Re: Drop type "smgr"?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Drop type "smgr"?
Date: 2019-02-28 21:33:06
Message-ID: CA+hUKGKJqEt+sW7Q+7a9JQY6WUSdvrRS2cWByQYO+pPDnJjbwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 1, 2019 at 4:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > On Thu, Feb 28, 2019 at 7:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> >>> Our current thinking is that smgropen() should know how to map a small
> >>> number of special database OIDs to different smgr implementations
>
> >> Hmm. Maybe mapping based on tablespaces would be a better idea?
>
> > In the undo log proposal (about which more soon) we are using
> > tablespaces for their real purpose, so we need that OID. If you SET
> > undo_tablespaces = foo then future undo data created by your session
> > will be written there, which might be useful for putting that IO on
> > different storage.
>
> Meh. That's a point, but it doesn't exactly seem like a killer argument.
> Just in the abstract, it seems much more likely to me that people would
> want per-database special rels than per-tablespace special rels. And
> I think your notion of a GUC that can control this is probably pie in
> the sky anyway: if we can't afford to look into the catalogs to resolve
> names at this code level, how are we going to handle a GUC?

I have this working like so:

* undo logs have a small amount of meta-data in shared memory, stored
in a file at checkpoint time, with all changes WAL logged, visible to
users in pg_stat_undo_logs view
* one of the properties of an undo log is its tablespace (the point
here being that it's not in a catalog)
* you don't need access to any catalogs to find the backing files for
a RelFileNode (the path via tablespace symlinks is derivable from
spcNode)
* therefore you can find your way from an UndoLogRecPtr in (say) a
zheap page to the relevant blocks on disk without any catalog access;
this should work even in the apparently (but not actually) circular
case of a pg_tablespace catalog that is stored in zheap (not something
we can do right now, but hypothetically speaking), and has undo data
that is stored in some non-default tablespace that must be consulted
while scanning the catalog (not that I'm suggesting that would
necessarily be a good idea to suppose catalogs in non-default
tablespaces; I'm just addressing your theoretical point)
* the GUC is used to resolve tablespace names to OIDs only by sessions
that are writing, when selecting (or creating) an undo log to attach
to and begin writing into; those sessions have no trouble reading the
catalog to do so without problematic circularities, as above

Seems to work; the main complications so far were coming up with
reasonable behaviour and interlocking when you drop tablespaces that
contain undo logs (short version: if they're not needed for snapshots
or rollback, they are dropped, wasting the rest of their undo address
space; otherwise they prevents the tablespace from being dropped with
a clear message to that effect).

It doesn't make any sense to put things like clog or any other SLRU in
a non-default tablespace though. It's perfectly OK if not all smgr
implementations know how to deal with tablespaces, and the SLRU
support should just not support that.

> The real reason I'm concerned about this, though, is that for either
> a database or a tablespace, you can *not* get away with having a magic
> OID just hanging in space with no actual catalog row matching it.
> If nothing else, you need an entry there to prevent someone from
> reusing the OID for another purpose. And a pg_database row that
> doesn't correspond to a real database is going to break all kinds of
> code, starting with pg_upgrade and the autovacuum launcher. Special
> rows in pg_tablespace are much less likely to cause issues, because
> of the precedent of pg_global and pg_default.

GetNewObjectId() never returns values < FirstNormalObjectId.

I don't think it's impossible for someone to want to put SMGRs in a
catalog of some kind some day. Even though the ones for clog, undo
etc would still probably need special hard-coded treatment as
discussed, I suppose it's remotely possible that someone might some
day figure out a useful way to allow extensions that provide different
block storage (nvram? zfs zvols? encryption? (see Haribabu's reply))
but I don't have any specific ideas about that or feel inclined to
design something for unknown future use.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shawn Debnath 2019-02-28 21:41:45 Re: Drop type "smgr"?
Previous Message Joe Conway 2019-02-28 21:09:32 Re: get_controlfile() can leak fds in the backend