Re: Big 7.1 open items

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
Subject: Re: Big 7.1 open items
Date: 2000-06-16 15:46:41
Message-ID: 7458.961170401@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> Tom Lane wrote:
>> It gets a little trickier if you want to be able to split
>> multi-gig tables across several tablespaces, though, since
>> you couldn't just append ".N" to the base table path in that
>> scenario.
>>
>> I'd be interested to know what sort of facilities Oracle
>> provides for managing huge tables...

> Oracle tablespaces are a collection of 1...n preallocated
> files. Each table then is bound to a tablespace and
> allocates extents (chunks) from those files.

OK, to get back to the point here: so in Oracle, tables can't cross
tablespace boundaries, but a tablespace itself could span multiple
disks?

Not sure if I like that better or worse than equating a tablespace
with a directory (so, presumably, all the files within it live on
one filesystem) and then trying to make tables able to span
tablespaces. We will need to do one or the other though, if we want
to have any significant improvement over the current state of affairs
for large tables.

One way is to play the flip-the-path-ordering game some more,
and access multiple-segment tables with pathnames like this:

.../TABLESPACE/RELATION -- first or only segment
.../TABLESPACE/N/RELATION -- N'th extension segment

This isn't any harder for md.c to deal with than what we do now,
but by making the /N subdirectories be symlinks, the dbadmin could
easily arrange for extension segments to go on different filesystems.
Also, since /N subdirectory symlinks can be added as needed,
expanding available space by attaching more disks isn't hard.
(If the admin hasn't pre-made a /N symlink when it's needed,
I'd envision the backend just automatically creating a plain
subdirectory so that it can extend the table.)

A limitation is that the N'th extension segments of all the relations
in a given tablespace have to be in the same place, but I don't see
that as a major objection. Worst case is you make a separate tablespace
for each of your multi-gig relations ... you're probably not going to
have a very large number of such relations, so this doesn't seem like
unmanageable admin complexity.

We'd still want to create some tools to help the dbadmin with slinging
all these symlinks around, of course. But I think it's critical to keep
the low-level file access protocol simple and reliable, which really
means minimizing the amount of information the backend needs to know to
figure out which file to write a page in. With something like the above
you only need to know the tablespace name (or more likely OID), the
relation OID (+name or not, depending on outcome of other argument),
and the offset in the table. No worse than now from the software's
point of view.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kristofer Munn 2000-06-16 16:08:21 ERROR: cannot find attribute 1 of relation pg_temp.13465.1
Previous Message Thomas Lockhart 2000-06-16 15:11:27 Re: Big 7.1 open items

Browse pgsql-patches by date

  From Date Subject
Next Message Thomas Lockhart 2000-06-16 16:27:22 Re: Big 7.1 open items
Previous Message mikeo 2000-06-16 15:27:56 Re: coalesce view error