Re: Getting data from Xml to Postgresql database

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: aravind chandu <avin_friends(at)yahoo(dot)com>
Cc: postgresql Forums <pgsql-general(at)postgresql(dot)org>
Subject: Re: Getting data from Xml to Postgresql database
Date: 2008-07-28 04:28:29
Message-ID: 488D4AED.7020802@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

aravind chandu wrote:
> Hi,
>
> I have some data in XML format and i need to upload in postgresql database using stored procedure. Can someone tell
> me the step by step procedure of the same as i will be doing it for the
> first time.

As Stephanie Bortzmeyer already explained to you, there is no simple
step-by-step process.

What you need to do depends on the nature of the XML data, and on how
you want to store it in PostgreSQL. Do you wish to load the XML and
transform it into interrelated sets of records in one or more tables? Do
you want to insert the XML document into a single `xml' data field in
the database? etc.

Remember, XML is just a very general mark-up language. It can describe
practically anything.

I can't give you a step-by-step process to load it, but I can give you a
list of things to think about that will help you solve the problem. I'm
assuming that you want to load the XML and transform it into records in
related tables in the database.

- Look at the XML document. Is it a standard XML dialect of some sort?
If so, what does the dialect describe? How would you model that in a
database?

- If the dialect is not standard, what is the XML document describing?
What data does it contain and how is it arranged?

- What are the relationships between data items in the XML document?
Look for repeating sets of records. Look for 1:N relationships where a
"master" entry has zero, one, or more subsiduary entries. At this point,
you should be getting an idea of the "entities" described by the
document, which should translate to tables in a simple database design.

- Determine what the data types of the various items are, which data
items are required and which are optional, etc.

- Look for "key" fields. These will be unique, and might be either ID
numbers or alphanumeric identifiers/codes. If there are keys they might
be used to reference one entry from another, so they're important for:

You've now had a decent look at the data you need to import. You should
now be able to follow an entity relationship modelling process (well
documented on the 'net - use Google) to design your database.

With your database designed, you should now know enough to write and
load the DDL code for it, then write a program that loads and inserts
the XML data. The program should parse the XML into a DOM (or use SAX to
progressively read it, if more appropriate) then, using your knowledge
of its structure and meaning, transform it into data tuples that can be
inserted into the database.

Here's a very simple XML document I just made up.

<?xml encoding=utf-8>
<contact>
<firstname>Fred</lastname>
<lastname>Jones</lastname>
<phone type="mobile">000 000 000</phone>
<phone type="home">0000 0000</phone>
<address>
<street>55 Nowhere Rd</street>
<town>Sub Urbia</town>
<state>Western Australia</state>
<country>Australia</country>
<postcode>6999</postcode>
<ismailingaddress/>
<isresidentialaddress/>
</address>
<address>
<street>8 Void Street</street>
<town>Some Tiny Village</town>
<state>Western Australia</state>
<country>Australia</country>
<postcode>6888</postcode>
<isworkaddress/>
</address>
</contact>

It's pretty obvious that this document describes contacts, each of which
has a first and last name, a list of phone numbers, and a list of addresses.

Without a DTD to tell us the rules the document follows (must there be
at least one phone number? etc) we have to make some guesses about how
to store the data. We'll generate a synthetic primary key since no
really appropriate key is present (the only alternative would be to use
a composite (firstname,lastname) primary key, which has plenty of problems):

Something like this should be usable to store the data:

CREATE TABLE contact (
contact_id SERIAL PRIMARY KEY,
firstname text NOT NULL,
lastname text NOT NULL
);

CREATE TABLE phone (
phone_id SERIAL PRIMARY KEY,
contact_id integer REFERENCES contact(contact_id),
phonetype text,
phonenumber text NOT NULL
);

CREATE TABLE address (
address_id SERIAL PRIMARY KEY,
contact_id integer REFERENCES contact(contact_id),
street text,
town text,
state text,
country text,
postcode text,
ismailingaddress bool,
isstreetaddress bool,
isworkaddress bool
);

If you knew more about the rules the XML document structure and data
must follow you could produce a better design with NOT NULL constraints
where appropriate, etc. You might fold the "phone" table into the
"contact" table if the document only permitted "type" values of "home"
or "mobile" for the "phone" field. That sort of thing.

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Klint Gore 2008-07-28 05:24:57 Re: copy ... from stdin csv; and bytea
Previous Message David Wilson 2008-07-28 04:18:16 copy ... from stdin csv; and bytea