Re: Tablespaces

From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tablespaces
Date: 2004-02-26 11:25:24
Message-ID: 403DD7A4.6020804@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-hackers-win32

Gavin Sherry wrote:
> Hi all,
>
> I've been looking at implementing table spaces for 7.5. Some notes and
> implementation details follow.
>
> ------
>
> Type of table space:
>
> There are many different table space implementations in relational
> database management systems. In my implementation, a table space in
> PostgreSQL will be the location of a directory on the file system in
> which files backing database objects can be stored. Global tables and
> non 'data' objects (WAL, CLOG, config files) will all remain in $PGDATA.
> $PGDATA/base will be the default table space.

Is it possible to put WALs and CLOGs into different tablespaces? (maybe
different RAID systems). Some companies want that ...

> A given table space will be identified by a unique table space name. I
> haven't decided if 'unique' should mean database-wide unique or
> cross-database unique. It seems to me that we might run into problems
> with CREATE DATABASE ... TEMPLATE = <database with table spaces> if the
> uniqueness of table spaces is limited to the database level.

I strongly vote for database cluster wide unique names because somebody
could have a tablespace "webusers" or something like that. To me this
makes far more sense.

> A table space parameter will be added to DDL commands which create
> physical database objects (CREATE DATABASE/INDEX/TABLE/SEQUENCE) and to
> CREATE SCHEMA. The associated routines, as well as the corresponding DROP
> commands will need to be updated. Adding the ability to ALTER <object>
> TABLESPACE <name> seems a little painful. Would people use it? Comments?

I think people won't need it in first place because this seems to be
really painful.
What really matters is that the number of tablespaces and file /
tablespace is unlimited. SAP DB has limited the number of devspaces to
32 (I think). This is real bull.... because if your database grows
unexpectedly you are in deep trouble (expert database design by SAP,
MySQL and 100000....0000 others).

> When an object is created the system will resolve the table space the
> object is stored in as follows: if the table space paramater is passed to
> the DDL command, then the object is stored in that table space (given
> validation of the table space, etc). If it is not passed, the object
> inherits its "parent's" table space where the parent/child hierarchy is as
> follows: database > schema > table > [index|sequence]. So, if you issued:
>
> create table foo.bar (...);
>
> We would first not that there is no TABLESPACE <name>, then cascade to
> the table space for the schema 'foo' (and possibly cascade to the table
> space for the database). A database which wasn't created with an explicit
> table space will be created under the default table space. This ensures
> backward compatibility.

Will users automatically be assigned to a certain table space? How is
this going to work?

> Creating a table space:
>
> A table space is a directory structure. The directory structure is as
> follows:
>
> [swm(at)dev /path/to/tblspc]$ ls
> OID1/ OID2/
>
> OID1 and OID2 are the OIDs of databases which have created a table space
> against this file system location. In this respect, a table space
> resembles $PGDATA/base. I thought it useful to keep this kind of
> namespace mechanism in place so that administrators do not need to create
> hierarchies of names on different partitions if they want multiple
> databases to use the same partition.
>
> The actual creation of the table space will be done with:
>
> CREATE TABLE SPACE <name> LOCATION </path/to/tblspc>;
>
> Before creating the table space we must:
>
> 1) Check if the directory exists. If it does, create a sub directory as
> the OID of the current database.
>
> 2) Alternatively, if the directory doesn't exist, attempt to create it,
> then the sub directory.
>
> I wonder if a file, such as PG_TBLSPC, should be added to the table space
> directory so that, in the case of an existing non-empty directory, we can
> attempt to test if the directory is being used for something else and
> error out. Seems like:
>
> CREATE TABLESPACE tbl1 LOCATION '/var/'
>
> which will result in something like '/var/123443' is a bad idea. Then
> again, the user should know better. Comments?
>
> If everything goes well, we add an entry to pg_tablespace with the table
> space location and name (and and OID).
>
>
> Tying it all together:
>
> The catalogs pg_database, pg_namespace, and pg_class will have a 'tblspc'
> field. This will be the OID of the table space the object resides in, or 0
> (default table space). Since we can then resolve relid/relname, schema and
> database to a tablespace, there aren't too many cases when extra logic
> needs to be added to the IO framework. In fact, most of it is taken care
> of because of the abstraction of relpath().
>
> The creation of table spaces will need to be recorded in xlog in the same
> way that files are in heap_create() with the corresponding delete logic
> incase of ABORT.
>
>
> Postmaster startup:
>
> Ideally, the postmaster at startup should go into each tblspc/databaseoid
> directory and check for a postmaster.pid file to see if some other
> instance is touching the files we're interested in. This will require a
> control file listing tblspc/databaseoid paths and it will need to plug
> into WAL in case we die during CREATE TABLESPACE. Comments?
>
>
> Creating a database
>
> I think that createdb() is going to have to be reworked if pg_tablespace
> isn't shared (ie, tablespaces are only database unique). The reason being
> that if we create a database which has a table space, pg_tablespace in the
> new database will have to be updated and that cannot be done atomically
> with the `cp` based mechanism we currently use.
>
> I think I'm going to have to get my hands dirty before I can tell the
> extent to which createdb() will need reworking.
>
>
> pg_dump
>
> Obviously pg_dump will need to be able to dump table spaces. pg_dump
> running against <7.5 will DDL commands without a table space parameter and
> as such the database's physical layout, when loaded into 7.5, will be the
> same as for <7.5.
>
> ---
>
> Comments? Questions? Suggestions?
>
> Thanks,
>
> Gavin

Do you plan support for limiting the size of a tablespace? ISPs will
vote for that because they can limit the size of a database on the
database level rather than on the operating system level. Of course
this can and (should???) be done on the operation system level but
people will definitely ask for that.
If sizing is not supported we should definitely provide minor
documentation which tells people how to do that on the operating system
level (at least poting to some useful information).

Best regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2004-02-26 11:58:25 Re: Tablespaces
Previous Message Richard Huxton 2004-02-26 11:25:23 Re: Tablespaces

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Gavin Sherry 2004-02-26 11:58:25 Re: Tablespaces
Previous Message Richard Huxton 2004-02-26 11:25:23 Re: Tablespaces