From: | Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net> |
---|---|
To: | PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org> |
Subject: | Better way to bulk-load millions of CSV records into postgres? |
Date: | 2002-05-21 20:40:00 |
Message-ID: | 1022013600.16609.61.camel@rebel |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Hi,
Currently, I've got a python script using pyPgSQL that
parses the CSV record, creates a string that is a big
"INSERT INTO VALUES (...)" command, then, execute() it.
top shows that this method uses postmaster with ~70% CPU
utilization, and python with ~15% utilization.
Still, it's only inserting ~190 recs/second. Is there a
better way to do this, or am I constrained by the hardware?
Instead of python and postmaster having to do a ton of data
xfer over sockets, I'm wondering if there's a way to send a
large number of csv records (4000, for example) in one big
chunk to a stored procedure and have the engine process it
all.
Linux 2.4.18
PostgreSQL 7.2.1
python 2.1.3
csv file on /dev/hda
table on /dev/hde (ATA/100)
--
+---------------------------------------------------------+
| Ron Johnson, Jr. Home: ron(dot)l(dot)johnson(at)cox(dot)net |
| Jefferson, LA USA http://ronandheather.dhs.org:81 |
| |
| "I have created a government of whirled peas..." |
| Maharishi Mahesh Yogi, 12-May-2002, |
! CNN, Larry King Live |
+---------------------------------------------------------+
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-05-21 23:27:47 | Re: Large tables being split at 1GB boundary |
Previous Message | Kevin Sterner | 2002-05-21 20:33:03 | Undead record haunts my database, need exorcism |