Re: Re: From TODO, XML?

From: mlw <markw(at)mohawksoft(dot)com>
To: Ken Hirsch <kenhirsch(at)myself(dot)com>
Cc: "Frank Ch(dot) Eigler" <fche(at)redhat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: From TODO, XML?
Date: 2001-07-29 16:19:48
Message-ID: 3B6437A4.6228C63D@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ken Hirsch wrote:
>
> mlw <markw(at)mohawksoft(dot)com> wrote:
>
> > "Frank Ch. Eigler" wrote:
> > > : So a parser that can scan a DTD and make a usable create table (...)
> > > : line would be very helpful. [...]
> > >
> > > Hmm, but hierarchically structured documents such as XML don't map
> > > well to a relational model. The former tend to be recursive (e.g.,
> > > have more levels of containment than the one or two that might be
> > > mappable to tables and columns.)
> >
> > Yes!!! Exactly, being able to understand the recursive nature of XML and
> create
> > relations on the fly would be a very cool feature.
>
> I think there is a pretty straight forward mapping, except for one possible
> ambiguity.
>
> If an element, say <address>, is contained within another element, say
> <employee>, it could either be a column (or group of columns) in an Employee
> table, or it could be a table Address which references Employee.
>
> When you say "create relations on the fly", what exactly do you mean? I can
> see it would be handy to have CREATE TABLE statements written for you, but
> it seems likely that a human would want to edit them before the tables are
> actually created. You cannot infer much type information from the DTD. I
> don't think there's a way to infer a primary key from a DTD, so you would
> want to either specify one or add a serial column (or perhaps that would
> always be done automatically). An XML schema would have more information,
> of course.

I have been thinking about this. A lot of guessing would have to be done, of
course. But, unless some extra information is specified, when you have an XML
record, contained within another, the parser would have to generate its own
primary key and a sequence for each table. Obviously, the user should be able
to specify the primary key for each table, but lacking that input, the XML
parser/importer should do it automatically.

So this:

<employee>
<name>Bill</name>
<position>Programmer</position>
<address>
<number>1290</number>
<street>
<name>Canton Ave</name>
</street>

<town>
<name>Milton</name>
</town>
</address>
</emplyee>

The above is almost impossible to convert to a relational format without
additional information or a good set of rules. However, we can determine which
XML titles are "containers" and which are "data." "employee" is a container
because it has sub tags. "position" is "data" because it has no sub tags.

We can recursively scan this hierarchy, decide which are containers and which
are data. Data gets assigned an appropriate SQL type and containers get
separated from the parent container, and an integer index is put in its place.
For each container, either a primary key is specified or created on the fly.

We insert sub containers first and pop back the primary key value, until we
have the whole record. The primary key could even be the OID.

A second strategy is to concatenate the hierarchy into the field name, as
street_name, town_name, and so on.

What do you think?

--
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ross J. Reedstrom 2001-07-29 19:46:57 Re: Re: From TODO, XML?
Previous Message Ken Hirsch 2001-07-29 15:50:05 Re: Re: From TODO, XML?