Re: Importing Many XML Records

From: "George Pavlov" <gpavlov(at)mynewplace(dot)com>
To: "Ron St-Pierre" <ronstp(at)mail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Importing Many XML Records
Date: 2006-01-28 00:03:20
Message-ID: CCB89282FCE1024EA3DCE687A96A51640499775F@ehost010-6.exch010.intermedia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> I'm sure that this has been asked before but I can't find any
> reference to it in google, and the search facility on
> postgresql.org is currently down.

http://groups.google.com/groups?q=group%3Apgsql.*

provides the same with a slight delay but arguably a better user
interface.

> I have a large number of entries (possibly 10,000+) in an XML
> file that I need to import into the database (7.4 on Debian)
> on a daily basis. Does anyone have any recommendations
> concerning the best way to do this? Is there some tool I
> should use or should I create the code in java to parse and
> import the data?
>
> If anyone has done this before, I would appreciate hearing
> how they did this.

This is generally outside the scope of this list. I am guessing (since I
don't know much about your data format or goals), but you probably want
to first transform the XML into a format suitable for importation into
the database using COPY, or (much less desirable) a bunch of insert
statements. In either case you should become familiar with XSLT
processing and write yourself an XSLT template to do the job.

I deal with a similar task using Saxon and TagSoup (which I highly
recommend for XML that is not well-formatted) and create a CSV file out
of a multitude of XML files (or a single XML file), which can then be
COPY-ed into a PG table. Instead of a CSV file one could create a SQL
script file of INSERT statements. I recommend Jeni Tennison's "Beginning
XSLT" book as an excellent reference on the subject of XSLT.

Depending on what your XML looks like you may get away without XSLT at
all, but just preprocess it with awk, sed, perl (Template::Extract is a
useful module) or whatever strikes your fancy.

Other questions to answer are do you want the "records" to stay as XML
in the database or do you want to import them into a regular table
format? If the former you may want to get familiar with the pgxml (aka
xml2 module) so you can query the XML data once inside your database.

George

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2006-01-28 00:10:56 Re: Postgres 8.1.2, Java, JDO, and case sensitivity woes
Previous Message Roger Hand 2006-01-27 23:59:38 Re: Postgres 8.1.2, Java, JDO, and case sensitivity woes