PLPythonU & Out of Memory - Importing Query

From: Jon Clements <jon(dot)clements(at)trgstrata(dot)co(dot)uk>
To: "pgsql-interfaces(at)postgresql(dot)org" <pgsql-interfaces(at)postgresql(dot)org>
Subject: PLPythonU & Out of Memory - Importing Query
Date: 2005-11-08 14:21:01
Message-ID: 4c05f1b439ce248766e7cd5375a83ddc@readgroup.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces


Hi there,

I am currently experimenting using plpythonu with postgresql 8.0 for Win32. It's basically a quick script that imports data from CSV files, but does some quite complicated data lookups and selections. The area in which I'm somewhat confunded is memory usage. The process successfully runs, but keeps climbing in memory usage relentlessly, successfully importing about 200k records, before the memory usage of postgres soars to 2gb and of course, shortly after that, grinds to a halt with a "Out of Memory" error.

I'm not deliberately storing anything in the SD/GD dictionaries, and am not dealing with triggers...

create function blah(text) returns int8 as
$$
# Initialisation of plans
myplan = plpy.prepare('insert into tablename (var1,var2) values($1,$2)', ['text','text'] )
# Setup external CSV data source
# For each record, that meets certain critera, execute insert...
for rec in dsource: plpy.execute(myplan, [Value1, Value2] )

# Finishing stuff
return some_meaningful_value
$$
LANGUAGE PLPYTHONU;

Given I'm importing about 250 million records and only want to end up with about 4 million, is
1) This possible using the above?
2) Better suited to something else (I've looked at COPY but that would require the entire table be uploaded first, then filtered and I'd like to avoid that if necessary, or thinking about it, I spose a trigger could be written that responded on the copy?). Also the other thing is COPY is only applicable to simple text files, while I want this import script to be generic from whatever datasource it may be importing from (an ODBC/DBF/Berkeley DB format etc...)

Anyhow, thanks in advance for any help.
Any Qs, please gimme a yell.

Regards,

Jon.

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Responses

Browse pgsql-interfaces by date

  From Date Subject
Next Message Tom Lane 2005-11-09 16:52:35 Re: PLPythonU & Out of Memory - Importing Query
Previous Message Greg Sabino Mullane 2005-11-07 13:01:15 Re: DBD::Pg returns 1/0 for boolean field ...