From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Drop type "smgr"? |
Date: | 2019-02-28 21:33:06 |
Message-ID: | CA+hUKGKJqEt+sW7Q+7a9JQY6WUSdvrRS2cWByQYO+pPDnJjbwQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 1, 2019 at 4:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > On Thu, Feb 28, 2019 at 7:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> >>> Our current thinking is that smgropen() should know how to map a small
> >>> number of special database OIDs to different smgr implementations
>
> >> Hmm. Maybe mapping based on tablespaces would be a better idea?
>
> > In the undo log proposal (about which more soon) we are using
> > tablespaces for their real purpose, so we need that OID. If you SET
> > undo_tablespaces = foo then future undo data created by your session
> > will be written there, which might be useful for putting that IO on
> > different storage.
>
> Meh. That's a point, but it doesn't exactly seem like a killer argument.
> Just in the abstract, it seems much more likely to me that people would
> want per-database special rels than per-tablespace special rels. And
> I think your notion of a GUC that can control this is probably pie in
> the sky anyway: if we can't afford to look into the catalogs to resolve
> names at this code level, how are we going to handle a GUC?
I have this working like so:
* undo logs have a small amount of meta-data in shared memory, stored
in a file at checkpoint time, with all changes WAL logged, visible to
users in pg_stat_undo_logs view
* one of the properties of an undo log is its tablespace (the point
here being that it's not in a catalog)
* you don't need access to any catalogs to find the backing files for
a RelFileNode (the path via tablespace symlinks is derivable from
spcNode)
* therefore you can find your way from an UndoLogRecPtr in (say) a
zheap page to the relevant blocks on disk without any catalog access;
this should work even in the apparently (but not actually) circular
case of a pg_tablespace catalog that is stored in zheap (not something
we can do right now, but hypothetically speaking), and has undo data
that is stored in some non-default tablespace that must be consulted
while scanning the catalog (not that I'm suggesting that would
necessarily be a good idea to suppose catalogs in non-default
tablespaces; I'm just addressing your theoretical point)
* the GUC is used to resolve tablespace names to OIDs only by sessions
that are writing, when selecting (or creating) an undo log to attach
to and begin writing into; those sessions have no trouble reading the
catalog to do so without problematic circularities, as above
Seems to work; the main complications so far were coming up with
reasonable behaviour and interlocking when you drop tablespaces that
contain undo logs (short version: if they're not needed for snapshots
or rollback, they are dropped, wasting the rest of their undo address
space; otherwise they prevents the tablespace from being dropped with
a clear message to that effect).
It doesn't make any sense to put things like clog or any other SLRU in
a non-default tablespace though. It's perfectly OK if not all smgr
implementations know how to deal with tablespaces, and the SLRU
support should just not support that.
> The real reason I'm concerned about this, though, is that for either
> a database or a tablespace, you can *not* get away with having a magic
> OID just hanging in space with no actual catalog row matching it.
> If nothing else, you need an entry there to prevent someone from
> reusing the OID for another purpose. And a pg_database row that
> doesn't correspond to a real database is going to break all kinds of
> code, starting with pg_upgrade and the autovacuum launcher. Special
> rows in pg_tablespace are much less likely to cause issues, because
> of the precedent of pg_global and pg_default.
GetNewObjectId() never returns values < FirstNormalObjectId.
I don't think it's impossible for someone to want to put SMGRs in a
catalog of some kind some day. Even though the ones for clog, undo
etc would still probably need special hard-coded treatment as
discussed, I suppose it's remotely possible that someone might some
day figure out a useful way to allow extensions that provide different
block storage (nvram? zfs zvols? encryption? (see Haribabu's reply))
but I don't have any specific ideas about that or feel inclined to
design something for unknown future use.
--
Thomas Munro
https://enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Shawn Debnath | 2019-02-28 21:41:45 | Re: Drop type "smgr"? |
Previous Message | Joe Conway | 2019-02-28 21:09:32 | Re: get_controlfile() can leak fds in the backend |