Relational Inheritance Features.

From: Trent Shipley <tshipley(at)deru(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Relational Inheritance Features.
Date: 2006-01-06 17:31:49
Message-ID: 200601061031.49292.tshipley@deru.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Relational Inheritance Supporting Features

Perhaps the most important deficit in Postgresql's current INHERITS model is
hard to detect. That deficit is the inability to name inheritance classes
themselves. One has to refer to the _per se_ class by referencing an
associated table. While it is nice to have anonymous classes, it is highly
desirable that classes be first-class objects, and at a minimum a proper
object should be able to take a name in its own right.[name]

Since a class' interface is the tuple for the associated table, classes should
not be directly instantiated unless the class owns (a possibly anonymous)
table. As first-class objects, however, one should be able to DECLARE or
DEFINE relational inheritance classes (or rather, the root for a tree or
branch) [define]. A define function is useful when creating a relational
inheritance analog to a pure abstract class; that is, a class that cannot be
directly instantiated. (An pure abstract class is not to be confused with an
abstract class table.)

Actually, lack of DEFINE is a major gap in SQL itself. Most modern
programming languages make a distinction between *defining* a “multi-variable
precursor” (sort of a light weight type) and *instantiating* a variable. In
SQL, definition is always conflated with instantiation using a CREATE
expression. It should be possible to DEFINE database objects and then CREATE
the objects; create them after an indefinite interval, and perhaps
repeatedly. One should not have to repeatedly run verbose scripts.

Even more than completing than use in some still theoretical relational
inheritance model, DEFINE would be of immediate practical use:

CREATE temporary tables, that often occur as clones and that are repeatedly
created and dropped.
CREATE views that are often dropped then recreated.
Creating indexes that are repeatedly dropped.
Managing frequently dropped constraints.
If one could define a free-floating or “immaculate” index, it could be
repeatedly instantiated and attached to columns in different tables.
If one could define an immaculate constraint, it could be repeatedly created
and attached to tables in a schema.

1.1 Some operations on relational classes.

With fully developed relational inheritance it would be highly desirable to
re-order columns in descendant tables. The first reason is a strong
aesthetic argument. One might, for example, always want some audit columns
to be the last five values in any table in a schema. A practical argument is
that one could create distinct class signatures by arbitrarily recombining
the order of columns. Naturally, it would be convenient, if one could also
refer to columns by name in procedural languages.

Relational inheritance classes should also support a form of polymorphism.
Specifically, the name of an ancestor table (or the table itself) and the
fully or sufficiently specified columns of the row's ordered form can be
regarded as a distinct signature [signature]. When selecting, inserting, or
updating signatures should be polymorphic. Note that an incautious user (or
overly permissive implementation, depending on perspective) could create
ambiguous queries. These could either be resolved by a tree traversal rule
or detected and errored out. Making ambiguous structures or ambiguous
queries errors is obviously required by good design.

1.2 Extending relations or classes

As I have mentioned elsewhere multiple inheritance requires merging multiple
parent classes. Merging multiple ancestries implies a mathematical order of
operations. It follows that when an INHERITS clause specifies more than one
table, the designer should be able to use parentheses to explicitly specify
the order for merging parent tables.

Of more general interest is that table attributes have scope. This is a new
dimension of complexity that does not exist in pure relational databases.
For practical purposes, I anticipate that it will be sufficient only to
specify local and subclass scope with phrases using the words LOCAL and
CLASS. Columns, constraints, indexes, and rules and triggers (event actions)
can have either local or (sub)class scope.

Local scope implies that the property (acts as if) it were created an managed
independently for the top of the class, and possibly for each descendant.
Note that how local properties are treated by inheritance is somewhat
ambiguous. Should these dependent objects be created or changed in
descendants? Since the answer is not clear I propose that the behavior must
be explicitly specified. When a local column, constraint, index, or event
action is created, altered, or dropped one must specify whether or not the
action cascades to descendants. (It should *not* cascade by default.)
Likewise, when a table inherits from a class with local properties one should
specify whether local properties are inherited. They should be inherited by
default.

Class constraints do not suffer from the semantic ambiguity of local
constraints. They are always inherited and any changes always apply to
descendants. That is what “class scope” means. A class property acts as if
there were a single instance of the property for the entire composite
relation that is the class.

From a practical perspective, the distinction between a local and a class
property will normally be invisible to a report writer, except in the case of
a unique constraint. The behavior of unique constraints is slightly more
complicated than the behavior of other properties with respect to scope.
Local unique constraints must be maintained separately for on each table
within the class. A (sub)class unique constraint will be maintained as a for
the entire class composite relation as a unit.

Creating a new class or adding new properties to a class is generally not
problematic. One ought to be able to DEFINE a new class either by declaring
a class _per se_ or by defining or creating a class aware table. Naturally,
it should be possible to create trivial classes consisting of just one table,
but it is not possible to do this with the current INHERITS grammar.

Obviously, adding a class or table to a class using the INHERITS clause is not
problematic.

As we consider defining and instantiating, it is worth asking whether there is
a place for the anonymous table. In Postgresql a table can be contained in
an anonymous class. Symmetry implies that a class should be able to contain
an anonymous table. Presumably this would be done by defining a class,
omitting the clause that names the class' table, then instantiating the
class.

Furthermore, one should experience no problems (baring namespace conflicts)
when adding columns, indexes, or event actions to a class.

Adding constraints, however, can be problematic. This is because they can
conflict with existing data either in the table itself or in its descendants
[immaculate].

Note that when working with relational inheritance (as with constraint
exclusion) one would often want to specify an abstract CHECK constraint by
what might be called family or type. For example, all constraints should
perhaps be a range of BIGINT of the form A <= B, B < C. Date constraints
also often take this form. Though admittedly a somewhat fuzzy concept, it
would be useful if one could specify check constraint domains or templates.

Obviously, one expects that descendant tables inherit the properties of their
parent class. These properties include the columns of the ancestor table,
the column order of the ancestor table, ancestor constraints, indexes, and
event actions. One currently would override a constraint with ALTER
TABLE ... DROP CONSTRAINT. Then, if desired, another constraint could be
added. Though effective, using ALTER TABLE to override table properties can
be a bit difficult for a human code reader to scan. Readability might be
improved if table columns could be declared as explicitly PLURAL or
explicitly UNCONSTRAINED.

Beyond what one normally thinks of as database constraints, a database
implementing relational inheritance model should include several controls
over how class properties are inherited. These controls constrain the
behavior of descendants. Descendant constraints thus control the behavior of
database objects rather than data so they can be thought of as a type of
meta-constraint.

Scope is a descendant constraint that has already been discussed. In theory
one could specify a grammar to scope inheritance to any list of generations.
In practice the ability to specify this generation (LOCAL) or all descendants
(CLASS) should suffice.

Whether an object is abstract or concrete is also a kind of descendant
constraint. There are a couple of different interpretations of “ABSTRACT”
applicable to relational inheritance. Commonly encountered is the case of a
completely specified ancestral template table that should never contain data
(even though it could). In another case the object is not fully specified.
Indeed, the object definition may be nearly empty. In this case the object
cannot contain data. In either event, before data can be inserted into an
abstract table (more properly an abstract class) a CONCRETE table must be
derived from the parental abstract class' table.

Abstraction makes sense for classes, tables, and the table properties of
columns, constraints, and
event actions. Though hardly nonsense, there may be little practical point in
declaring an index abstract. A class may be declared abstract in its own
right as it is defined. (Indeed, abstraction implies the separation of
definition and instantiation.) A table cannot directly inherit from an class
that has been directly declared abstract. Instead it would be necessary to
derive another class from the directly abstract class. Note that an
inheritance class has one direct property, its table. If its table is
abstract then the owning class is indirectly abstract.

A table may be directly declared abstract. In addition, if any of a table's
columns, constraints, indexes, or event actions are abstract the table is
indirectly abstract. It is reasonable to expect that abstract columns,
constraints, and perhaps indexes will be abstract as incompletely specified
placeholders. An event action could also be abstract as an artifact of an
object oriented programming language.

In addition, a class or table could be FINAL.

Columns, constraints, indexes, and event actions could all in theory control
descendant scope. One could, for example, oblige a unique constraint in all
descendant tables to have local scope. Similarly, one could provide a
feature to lock out overriding a given column, constraint, index, or event
action in descendant tables. Likewise, one should be able to explicitly
allow, or even require overriding.

Constraints (or constraint types), indexes, and event actions could all be
forbidden, allowed, or required in descendant tables.

Any time a class or is derived from another, the default behavior is that any
class property (or sub-property) can be overridden. To derive one class from
another in Postgresql one would use the INHERITS clause in the context of a
DEFINE or CREATE for a class or table. This is the obvious time to alter
table properties.

Using ALTER TABLE (and perhaps ALTER CLASS) present another opportunity to
override table properties. Indeed, in the case of indexes ALTER may be the
only way to override ancestral table properties.

At DEFINE or CREATE time I do not see overriding constraints, indexes, or
event actions as particularly problematic. Overriding columns could be more
problematic, however. To avoid casting up, it may be necessary to require
that the type of an overridden column effectively be a domain of its
ancestors [domain].

The other notable implication of, for example, constraint overriding, is that
name spaces become ramified. If class-table bar inherits from class-table
foo and foo's constraint acme, then when bar overrides constraint acme as
acme' we need to distinguish between the two acmes with identical local
names. We need foo'acme and foo'bar'acme. Postgresql does not currently
support ramified name spaces.

Modifying extant properties of existing classes (that is, using ALTER), is
more problematic than creating a property or overriding a property when a
class or table is first defined. Changing a column produces the same problem
that it would for a single table, except that the data incompatibility
problem when changing to a more restrictive type extends to the entire
composite table and not just the root table itself. Adding or changing
constraints also have the problem of being more restrictive than existing
data in the composite table. In addition, if the table is in a descendant
class, changing a child property (especially a constraint) could conflict
with an ancestor's descendant constraint.

To DROP classes or class properties is even more involved than altering a
class. Dropping an entire class from the network might or might not be
problematic depending on its location. When a class is a leaf, it can always
be dropped. When the class is an intermediate node there are at least four
options options. Deleting the class can be forbidden, the entire subtree can
be pruned, all the subtrees formerly rooted in the deleted node can be freed
as stand-alone classes, or one can try to merge any orphan subtrees into the
existing tree. Two approaches suggest themselves as merge strategies. The
deleted node could be replaced with a completely empty and anonymous class.
Alternatively, one could try to make the orphan subtrees inherit directly
from the deleted class' parent(s).

When the deleted node is the top node (and not also a leaf node) ones options
are similar to those for intermediate nodes, except that the option of
grafting orphans to their grandparent(s) is unavailable.

Note that in the model suggested here it is suggested that dropping a class'
table should be distinct from dropping the class _per se_. DROP TABLE ought
to result in a completely abstract class node. Deleting a table in a leaf
node is no problem. When other objects depend on the deleted table the
action the relevant actions are to either forbid the DROP TABLE or merge
descendant class-tables with the now empty, abstract class.

As a side note, TRUNCATE also would need versions for single tables and
composite tables.

Dropping columns, constraints, indexes, or event actions can all get tricky
when it comes to scope. In particular, one would want a way to say “drop
this column even when it has been overridden.” Furthermore, dropping a
property that was REQUIRED by an ancestor table would cause the attempt to
delete the property to fail.

1.3 Relational X Network (Would relational inheritance need support outside
SQL)

In Postgresql at present ONLY is the only tool for navigating an inheritance
net. I have already suggested that much more sophisticated tools for naming
class nodes will be *needed*. Many more, along the lines of XPath, would no
doubt be desirable.

The problem of insufficient network navigation tools pales compared to the
related problem of whether users would be satisfied with SQL as the query
tool for the network organization that would be part of a database with
relational inheritance. Would users feel the need for a network oriented
query language like XQuery?

1.4 Relations other than tables

There is also the question of whether relations other than tables might
benefit from features of relational inheritance. In Postgresql all of the
following can be regarded as relations: tables (of various flavors), indexes,
foreign constraints, unique constraints, views, and relational classes.

One areas of particular interest is the degree to which defined but
uninstantiated relations should be treated the same as instantiated
relations. Application of relational inheritance to views would also be of
great interest once the problem of relational inheritance and tables was in
hand.

2 Conclusion

The most important point made in this essay is that some operations would be
*very* useful indeed for database design in general even if there were no
provision for relational inheritance. First, relational databases should
separate definition from instantiation. In particular Postgresql should
adopt a “DEFINE object definition_name / CREATE object_name FROM
definition_name” feature in advance of its competitors. The ability to
define objects will come in very handy for temporary tables, creating
variations on a table theme, and so on. It will also eliminate the need to
repeatedly run verbose SQL scripts.

The second insight of general interest is that it should be possible to
recycle constraint and index declarations. Furthermore, it should be
possible to create immaculate definitions of constraints and indexes that are
not attached to any particular table or view. Immaculate properties would
eliminate the need to repeatedly script nearly identical constraints and
indexes across large numbers of relations in a schema.

For features germane only to classes the most important observation is that
relational inheritance classes should be first class database objects. Most
important, classes should be able to be named. Also it should be possible to
directly declare classes. Developers should be able to create empty classes.
Even more important, they should be able to create a class with only one
table [multiple].

In addition, any reasonably complete implementation would need to allow for
local and class scope, descendant constraints and cascade controls for ALTER
and DROP.

Even more important is the question of what is to be gained by relational
inheritance. I have run across design problems where relational inheritance
with sub-classing primary keys would have been *very* convenient. One
imagines, however, that from the perspective of a database designer trained
in relational design, the hierarchical network dimension of relational
inheritance will be “abused”. (Of course, “abuse” would be a matter of
perspective.)

Not having much personal experience, my arm-chair expectation for current
practice is that OO application and relational database schemata are
developed in separately. Moving data from the application to the database
thus requires a translation layer. (The translation layer usually being _ad
hoc_ and distributed throughout the application rather than being a discrete
module.) Report writers, however, can directly query the “well designed”
relational schema.

With the object oriented potential of relational inheritance, I would expect
to see a development pattern where an object oriented program is translated
into an object oriented schema. One could then build a translation layer of
views allowing indirect query of the underlying data, but what one really
expects is that maintenance database developers and report writers will have
to directly query an object oriented network database using SQL on an _ad
hoc_ basis.

My own strong suspicion is that with relational inheritance many object
oriented programmers will design their own schemata. Or, in what will amount
to the same thing, schemata will be developed based directly on object
oriented analyses. Another likely scenario is that when a project gets
behind, relational schemata get thrown out to speed application development.

The net effect will be to simply move complexity downstream rather than
dealing with it early on. Object oriented programmers will have a trivial
problem putting data into databases slavishly parallel to object oriented
data structures. At the other end of the development food chain, report
writers will have a devil of a time writing SQL to extract data from what
will have effectively become network databases.

------

[name]: Perhaps one could add a clause to CREATE TABLE like [CLASS [NAME] [IS]
_class_name_].

[define]: One would choose DECLARE, DEFINE, or some other phrase based on what
has not been reserved by the current SQL standard. Hereafter, I use “define”
and “DEFINE”.

[signature]: For those who prefer safety to power, a signature could be based
on the fully or sufficiently specified *unordered* columns of a row and an
ancestor class or table name.

[immaculate]: It seems a shame that constraint names can only be used to
indicate what constraint one wants to drop. Where there is a name, there is
an invitation for reuse. In all database schemata I have known check
constraints and foreign key constraints (and indexed columns) are repeatedly
declared with all but identical lengthy SQL expressions. When a constraint
is first declared in-line how much better it would be if the initial form
could be reused.

An even better solution would be the ability to define the form of a
constraint (or index) without reference to a concrete table. I think of this
as an “immaculate” definition of a normally dependent object. One could then
instantiate multiple objects based on the immaculate definition everywhere
the otherwise tediously repetitive constraint declaration would be needed.
This conforms to the best practice of storing information once and reusing it
many times. It should cut down on errors and reduce maintenance overhead.

[domain]: As an example of overriding with domains BIGINT could beget BIGINT
that begets INTEGER that begets the user defined domain of whole_integer
(where whole_integer is not to be confused with unsigned two octet integer).

[multiple]: For the sake of simplicity I have assumed that each class
contained either zero or one tables. Not discussed in this paper is the
possibility that a class could hold multiple tables (or mix of tables, views,
relational inheritance classes, ...). For example, class foo contains tables
A, B, and C. Class bar inherits from foo and thus contains A', B', and C'.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2006-01-06 17:35:58 Re: Indexes works only on miss
Previous Message Bruno Almeida do Lago 2006-01-06 17:26:27 Strange behavior