Re: [HACKERS] Proposed Changes to PostgreSQL

From: Peter Eisentraut <e99re41(at)DoCS(dot)UU(dot)SE>
To: Chris Bitmead <chris(at)bitmead(dot)com>
Cc: "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org>, pgsql-sql(at)postgreSQL(dot)org, pgsql-general(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Proposed Changes to PostgreSQL
Date: 2000-02-03 16:26:50
Message-ID: Pine.GSO.4.02A.10002031705390.10941-100000@Katt.DoCS.UU.SE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-sql

While I think that these kinds of changes are a No Go because they'd break
a lot of applications (including mine), IF (big if) you really want to
make major changes to the inheritance scheme, I got a few ideas.

First let me say that I like the conceptual simplicity of relational
databases. Some or all of the ideas thrown around here break with
simplicity and consistency, by suggesting, e.g., that some commands be
allowed only on entire inheritance structures, while others be allowed on
individual tables, and attached to it a discussion which ones those should
be. That doesn't strike me as too promising.

A lot of people use inheritance to create "consistent schemas", that is,
they empty create base tables, such as "address" which are inherited by
tables such as customer, vendor, office, etc. That is probably not what
inheritance is for, perhaps it should be some sort of a macro-like
concept, such as create table vendor (name text,
copy_schema_from(address), more fields), expanded by the parser. This is
pretty much what it does now, only this scheme wouldn't have to actually
store the (useless) inheritance link.

Anyway, an idea I had would be to reimplement inheritance based on joins,
since this is what the "pure relational" solution would be anyway. When I
create a table B that is based on A, all the system does is create the
table B as usual and store a note "I inherit from A". Any row you insert
into B also creates a row in A, and the row in B contains an oid pointer
to it. Thus a select on B performs a join on A.oid and B.row_in_A_pointer.
A select on A just returns all the rows in A, no extras needed. A delete
on B deletes the row in B and in A. A delete in A would cascade to B. Both
of this can be gotten for free with foreign keys. Adding a column to A
just adds the column to A, all other tables get the new column magically
and in the right order. Same with dropping columns, etc.

In short, this approach solves all inheritance problems at once and does
so without adding any extra kludges besides the "I inherited from" field,
which is static, plus the necessary transformations necessary in the
parser. The drawback is of course that a select from an inherited table
would always incur a join, perhaps some optimizing could be done in this
direction. But the bottom line is that the compatibility issue looms big.

-Peter

On Thu, 3 Feb 2000, Chris Bitmead wrote:

> Hi,
>
> I've been spending a lot of time lately with gdb and tracing the
> back-end seeing if I can understand it enough to make some changes.
> I'm starting to actually understand a lot of stuff, so in order
> to have some possibility of having my changes accepted, I want to
> discuss
> them here first. Based on that, I'm going to hopefully make an attempt
> at implementation. I have a patch for one of these changes already
> if I get the go ahead.
>
> THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE.
>
> Speak now about these changes or please, forever hold your peace. Of
> course you can comment later if I screw up implementation.
>
> The proposed changes are....
>
> 1) An imaginary field in every tuple that tells you the class it came
> from.
> This is useful when you select from table* and want to know which
> relation the object actually came from. It wouldn't be stored on disk,
> and like oid it wouldn't be displayed when you do SELECT *. The field
> would be called classname. So you could have...
> SELECT p.classname, p.name FROM person p;
> person | Fred
> student | Bill
> employee | Jim
> person | Chris
>
> If you want to know the exact behaviour it is as if every table in the
> database had done to it...
> ALTER TABLE foo ADD COLUMN classname TEXT;
> UPDATE foo SET classname='foo';
>
> Of course this is not how it would be implemented. It is just
> reference for how it will appear to work. BTW, this idea was also
> in the original berkeley design notes.
>
> 2) Changing the sense of the default for getting inherited tuples.
> Currently you only get inherited tuples if you specify "tablename*".
> This would be changed so that you get all sub-class tuples too by
> default unless you specify "ONLY tablename". There are several
> rationale for this. Firstly this is what Illustra/Informix have
> implemented. Secondly, I believe it is more logical from an OO
> perspective as well as giving a more useful default. If a politician
> IS a person and I say SELECT * from person, then logically I should
> see all the politicians because they are people too (so they claim
> :). Thirdly, there are a whole range of SQL statements that should
> probably be disallowed without including sub-classes. e.g. an ALTER
> TABLE ADD COLUMN that does not include sub-classes is almost certainly
> undesirable. It seems ashame to have to resort to non-standard SQL
> with the "*" syntax in this case when it is really your only
> choice. Basicly, wanting ONLY a classname is a far more unusual
> choice, and leaving off the "*" is a common error. Fourthly, it seems
> out of character for the SQL language to have this single character
> operator. The SQL style is to use wordy descriptions of the operators
> meaning. "ONLY" fits well here because it describes its own meaning
> perfectly whereas to the unitiated, "*" is harder to guess at. While
> this change is an incompatibility I hope for those few people using
> inheritance they can accept the need to move forward without
> over-burden of backwards compatibility.
>
> 3) The ability to return different types of rows from a SELECT. This
> is to allow implementation of ODBMS functionality where a query could
> be required to instantiate objects of differing types with differing
> attributes.
>
> I would propose that that anytime you do a SELECT * from a base table
> that you would get back the full rows from those sub tables. Since the
> current PQ interface which doesn't support this notion would remain
> unchanged this wouldn't affect current users.
>
> It's probably also desirable to have a syntax for getting just the
> columns of the base table when this is desired. Say perhaps SELECT %
> from table. This would be a performance hack for users of libpq and a
> functionality difference for users of psql.
>
> The reason I think the "*" syntax should take on the new functionality
> is because it would be more consistent with what the OQL (object query
> language) standard specifies, and also because it seems the more
> useful default. Also there is no compatibility reason not to do it.
>
> In addition it would be legal to specify columns that only exist in
> sub-classes. For example, if we had
>
> CREATE TABLE person (name TEXT);
> CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person);
>
> it would be legal to say...
> > SELECT * FROM person;
> NAME
> ----
> Fred
> Bill
>
> NAME | STUDENTID | FACULTY
> --------------------------
> Jim | 23455 | Science
> Chris| 45666 | Arts
>
> > SELECT *, studentid FROM person;
> NAME
> ----
> Fred
> Bill
>
> NAME | STUDENTID
> ----------------
> Jim | 23455
> Chris| 45666
>
> > SELECT *, studentid FROM ONLY person;
> ERROR: person does not contain studentid.
>
> > SELECT % FROM person;
> NAME
> ----
> Fred
> Bill
> Jim
> Chris
>
> As you can see, it is desirable that psql be modified to be able to
> print these differing tuple types. Presumably new column headings will
> be printed when a tuple is differing to the previous one. Likely it
> will be often desirable to do a
> SELECT * FROM person p ORDER BY p.classname;
> in order to have all the tuples of a particular type grouped together.
>
> In addition some extenions will be done to the PQ interface to support
> these differing return types. The current PQ interface will be left
> unchanged and backwards compatible for retrieving rows of a single
> type.
>
> Also there should be an settable option that specifies that "*" should
> also return the normally ignored columns of oid and classname. This is
> so that OO programs that embed SQL into them also get back the oid and
> classname which are required for the behind the scenes implementation
> of an ODMG client. Something like...
>
> SET SHOW_OID TRUE;
> SHOW_CLASSNAME TRUE;
>
> SELECT * FROM person;
>
> OID CLASSNAME NAME
> -------------------
> 2344 person Fred
> 3445 person Bill
>
> OID CLASSNAME NAME | STUDENTID | FACULTY
> -----------------------------------------
> 2355 student Jim | 23455 | Science
> 5655 student Chris| 45666 | Arts
>
> ************
>
>

--
Peter Eisentraut Sernanders vaeg 10:115
peter_e(at)gmx(dot)net 75262 Uppsala
http://yi.org/peter-e/ Sweden

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Taral 2000-02-03 16:50:30 Re: [HACKERS] Re: [SQL] Proposed Changes to PostgreSQL
Previous Message Tom Lane 2000-02-03 16:26:11 Re: [HACKERS] Re: [SQL] Proposed Changes to PostgreSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2000-02-03 16:28:07 Re: [HACKERS] SERIAL type isn't listed...?
Previous Message Tom Lane 2000-02-03 16:26:11 Re: [HACKERS] Re: [SQL] Proposed Changes to PostgreSQL

Browse pgsql-sql by date

  From Date Subject
Next Message Peter Eisentraut 2000-02-03 16:43:43 Re: [SQL] INSTALL doc correction ...
Previous Message Tom Lane 2000-02-03 16:26:11 Re: [HACKERS] Re: [SQL] Proposed Changes to PostgreSQL