Re: large xml database

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: large xml database
Date: 2010-10-31 21:26:01
Message-ID: 4CCDDEE9.9010803@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Viktor Bojović wrote:
>
>
> On Sun, Oct 31, 2010 at 9:42 PM, Rob Sargent <robjsargent(at)gmail(dot)com
> <mailto:robjsargent(at)gmail(dot)com>> wrote:
>
>
>
>
> Viktor Bojovic' wrote:
>
>
>
> On Sun, Oct 31, 2010 at 2:26 AM, James Cloos
> <cloos(at)jhcloos(dot)com <mailto:cloos(at)jhcloos(dot)com>
> <mailto:cloos(at)jhcloos(dot)com <mailto:cloos(at)jhcloos(dot)com>>> wrote:
>
> >>>>> "VB" == Viktor Bojovic' <viktor(dot)bojovic(at)gmail(dot)com
> <mailto:viktor(dot)bojovic(at)gmail(dot)com>
>
> <mailto:viktor(dot)bojovic(at)gmail(dot)com
> <mailto:viktor(dot)bojovic(at)gmail(dot)com>>> writes:
>
> VB> i have very big XML documment which is larger than 50GB and
> want to
> VB> import it into databse, and transform it to relational
> schema.
>
> Were I doing such a conversion, I'd use perl to convert the
> xml into
> something which COPY can grok. Any other language, script
> or compiled,
> would work just as well. The goal is to avoid having to
> slurp the
> whole
> xml structure into memory.
>
> -JimC
> --
> James Cloos <cloos(at)jhcloos(dot)com <mailto:cloos(at)jhcloos(dot)com>
> <mailto:cloos(at)jhcloos(dot)com <mailto:cloos(at)jhcloos(dot)com>>>
>
> OpenPGP: 1024D/ED7DAEA6
>
>
> The insertion into dabase is not very big problem.
> I insert it as XML docs, or as varchar lines or as XML docs in
> varchar format. Usually i use transaction and commit after
> block of 1000 inserts and it goes very fast. so insertion is
> over after few hours.
> But the problem occurs when i want to transform it inside
> database from XML(varchar or XML format) into tables by parsing.
> That processing takes too much time in database no matter if
> it is stored as varchar lines, varchar nodes or XML data type.
>
> --
> ---------------------------------------
> Viktor Bojovic'
>
> ---------------------------------------
> Wherever I go, Murphy goes with me
>
>
> Are you saying you first load the xml into the database, then
> parse that xml into instance of objects (rows in tables)?
>
>
> Yes. That way takes less ram then using twig or simple xml, so I tried
> using postgre xml functions or regexes.
>
>
>
> --
> ---------------------------------------
> Viktor Bojović
> ---------------------------------------
> Wherever I go, Murphy goes with me
Is the entire load a set of "entry" elements as your example contains?
This I believe would parse nicely into a tidy but non-trivial schema
directly without the "middle-man" of having xml in db (unless of course
you prefer xpath to sql ;) )

The single most significant caveat I would have for you is Beware:
Biologists involved. Inconsistency (at least overloaded concepts)
almost assured :). EMBL too is suspect imho, but I've been out of that
arena for a while.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Viktor Bojović 2010-10-31 22:52:42 Re: large xml database
Previous Message Viktor Bojović 2010-10-31 21:16:09 Re: large xml database