Re: Big 7.1 open items

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
Subject: Re: Big 7.1 open items
Date: 2000-06-18 16:06:29
Message-ID: 12160.961344389@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> ... We could even get fancy and
> round-robin through all the extents directories, looping around to the
> beginning when we run out of them. That sounds nice.

That sounds horrible. There's no way to tell which extent directory
extent N goes into except by scanning the location directory to find
out how many extent subdirectories there are (so that you can compute
N modulo number-of-directories). Do you want to pay that price on every
file open?

Worse, what happens when you add another extent directory? You can't
find your old extents anymore, that's what, because they're not in the
right place (N modulo number-of-directories just changed). Since the
extents are presumably on different volumes, you're talking about
physical file moves to get them where they should be. You probably
can't add a new extent without shutting down the entire database while
you reshuffle files --- at the very least you'd need to get exclusive
locks on all the tables in that tablespace.

Also, you'll get filename conflicts from multiple extents of a single
table appearing in one of the recycled extent dirs. You could work
around it by using the non-modulo'd N as part of the final file name,
but that just adds more complexity and makes the filename-generation
machinery that much more closely tied to this specific way of doing
things.

The right way to do this is that extent N goes into extents subdirectory
N, period. If there's no such subdirectory, create one on-the-fly as a
plain subdirectory of the location directory. The dbadmin can easily
create secondary extent symlinks *in advance of their being needed*.
Reorganizing later is much more painful since it requires moving
physical files, but I think that'd be true no matter what. At least
we should see to it that adding more space in advance of needing it is
painless.

It's possible to do it that way (auto-create extent subdir if needed)
without tying the md.c machinery real closely to a specific filename
creation procedure: it's just the same sort of thing as install programs
customarily do. "If you fail to create a file, try creating its
ancestor directory." We'd have to think about whether it'd be a good
idea to allow auto-creation of more than one level of directory; offhand
it seems that needing to make more than one level is probably a sign of
an erroneous path, not need for another extent subdirectory.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2000-06-18 21:24:26 Re: OK, OK, Hiroshi's right: use a seperately-generated filename
Previous Message Bruce Momjian 2000-06-18 14:35:54 Re: Big 7.1 open items

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2000-06-18 21:24:37 Re: Re: BeOS and IPC - try 999
Previous Message Bruce Momjian 2000-06-18 14:35:54 Re: Big 7.1 open items