Re: Big 7.1 open items

From: JanWieck(at)t-online(dot)de (Jan Wieck)
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
Subject: Re: Big 7.1 open items
Date: 2000-06-18 00:10:09
Message-ID: 200006180010.CAA07405@hot.jw.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> It gets a little trickier if you want to be able to split
> >> multi-gig tables across several tablespaces, though, since
> >> you couldn't just append ".N" to the base table path in that
> >> scenario.
> >>
> >> I'd be interested to know what sort of facilities Oracle
> >> provides for managing huge tables...
>
> > Oracle tablespaces are a collection of 1...n preallocated
> > files. Each table then is bound to a tablespace and
> > allocates extents (chunks) from those files.
>
> OK, to get back to the point here: so in Oracle, tables can't cross
> tablespace boundaries, but a tablespace itself could span multiple
> disks?

They can. The path in

ALTER TABLESPACE <tsname> ADD DATAFILE ...

can point to any location the db system has access to.

>
> Not sure if I like that better or worse than equating a tablespace
> with a directory (so, presumably, all the files within it live on
> one filesystem) and then trying to make tables able to span
> tablespaces. We will need to do one or the other though, if we want
> to have any significant improvement over the current state of affairs
> for large tables.
>
> One way is to play the flip-the-path-ordering game some more,
> and access multiple-segment tables with pathnames like this:
>
> .../TABLESPACE/RELATION -- first or only segment
> .../TABLESPACE/N/RELATION -- N'th extension segment
>
> [...]

In most cases all objects in one database are bound to one or
two tablespaces (data and indices). So you do an estimation
of the size required, create the tablespaces (and probably
all their extension files), then create the schema and load
it. The only reason not to do so is if your DB exceeds some
size where you have to fear not beeing able to finish online
backups before getting into Online-Relolog stuck. Has to do
the the online backup procedure of Oracle.

> This isn't any harder for md.c to deal with than what we do now,
> but by making the /N subdirectories be symlinks, the dbadmin could
> easily arrange for extension segments to go on different filesystems.
> Also, since /N subdirectory symlinks can be added as needed,
> expanding available space by attaching more disks isn't hard.
> (If the admin hasn't pre-made a /N symlink when it's needed,
> I'd envision the backend just automatically creating a plain
> subdirectory so that it can extend the table.)

So the admin allways have to leave enough freespace in the
default location to keep the DB running until he can take it
offline, move the autocreated files and create the symlinks.
What a pain for 24/7 systems.

> We'd still want to create some tools to help the dbadmin with slinging
> all these symlinks around, of course. But I think it's critical to keep
> the low-level file access protocol simple and reliable, which really
> means minimizing the amount of information the backend needs to know to
> figure out which file to write a page in. With something like the above
> you only need to know the tablespace name (or more likely OID), the
> relation OID (+name or not, depending on outcome of other argument),
> and the offset in the table. No worse than now from the software's
> point of view.

Exactly the "low-level file access" protocol is highly
complicated in Postgres. Because nearly every object needs
his own file, we need to deal with virtual file descriptors.
With an Oracle-like tablespace concept and a fixed limit of
total tablespace files (this time OS or installation
specific), we could keep them all open all the time. IMHO a
big win.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2000-06-18 00:20:15 Re: Big 7.1 open items
Previous Message Jan Wieck 2000-06-17 23:27:09 Re: Big 7.1 open items

Browse pgsql-patches by date

  From Date Subject
Next Message Jan Wieck 2000-06-18 00:20:15 Re: Big 7.1 open items
Previous Message Jan Wieck 2000-06-17 23:27:09 Re: Big 7.1 open items