large xml database

From: Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: large xml database
Date: 2010-10-30 21:49:29
Message-ID: AANLkTi=ZEO1W=FOxdH+S3twj_A1VCpyyzes7HFghJ7T-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi,
i have very big XML documment which is larger than 50GB and want to import
it into databse, and transform it to relational schema.
When splitting this documment to smaller independent xml documments i get
~11.1mil XML documents.
I have spent lots of time trying to get fastest way to transform all this
data but every time i give up because it takes too much time. Sometimes more
than month it would take if not stopped.
I have tried to insert each line as varchar into database and parse it using
plperl regex..
also i have tried to store every documment as XML and parse it, but it is
also to slow.
i have tried to store every documment as varchar but it is also slow when
using regex to get data.

many tries have failed because 8GB of ram and 10gb of swap were not enough.
also sometimes i get that more than 2^32 operations were performed, and
functions stopped to work.

i wanted just to ask if someone knows how to speed this up.

thanx in advance

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Joseph Krogh 2010-10-30 22:06:25 Re: large xml database
Previous Message Will Furnass 2010-10-29 19:31:10 Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable