Skip site navigation (1) Skip section navigation (2)

Re: Better way to bulk-load millions of CSV records into

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Better way to bulk-load millions of CSV records into
Date: 2002-05-22 17:18:02
Message-ID: 1022087882.19121.7.camel@rebel (view raw or flat)
Thread:
Lists: pgsql-novice
On Wed, 2002-05-22 at 09:19, Tom Sheehan wrote:
> Have you looked at the COPY command is psql  for this?  There are the, 'FROM
> { 'filename' | stdin } and [ [USING] DELIMITERS] clauses that may be of use.
> It seems to me that bulk loading is what they were designed for.

It doesn't handle csv files where there are double-quotes 
around each field.  Also, it runs it 1 big transaction,
and if it fails, you must load the whole thing over again...

From the mailing list archives, I see where COPY was only
designed for db_dump, but has had a few features added to
it.  It was not designed to be a full-featured bulk loader/unloader.

> ----- Original Message -----
> From: "Ron Johnson" <ron(dot)l(dot)johnson(at)cox(dot)net>
> To: "PgSQL Novice ML" <pgsql-novice(at)postgresql(dot)org>
> Sent: Tuesday, May 21, 2002 3:40 PM
> Subject: [NOVICE] Better way to bulk-load millions of CSV records into
> postgres?
> 
> 
> >
> > Hi,
> >
> > Currently, I've got a python script using pyPgSQL that
> > parses the CSV record, creates a string that is a big
> > "INSERT INTO VALUES (...)" command, then, execute() it.
> >
> > top shows that this method uses postmaster with ~70% CPU
> > utilization, and python with ~15% utilization.
> >
> > Still, it's only inserting ~190 recs/second.  Is there a
> > better way to do this, or am I constrained by the hardware?
> >
> > Instead of python and postmaster having to do a ton of data
> > xfer over sockets, I'm wondering if there's a way to send a
> > large number of csv records (4000, for example) in one big
> > chunk to a stored procedure and have the engine process it
> > all.
> >
> > Linux 2.4.18
> > PostgreSQL 7.2.1
> > python 2.1.3
> > csv file on /dev/hda
> > table on /dev/hde  (ATA/100)

-- 
+---------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron(dot)l(dot)johnson(at)cox(dot)net     |
| Jefferson, LA  USA      http://ronandheather.dhs.org:81 |
|                                                         |
| "I have created a government of whirled peas..."        |
|   Maharishi Mahesh Yogi, 12-May-2002,                   |
!   CNN, Larry King Live                                  |
+---------------------------------------------------------+


In response to

pgsql-novice by date

Next:From: Ron JohnsonDate: 2002-05-22 17:40:03
Subject: Re: optimising data load
Previous:From: Tom LaneDate: 2002-05-22 16:36:45
Subject: Re: pl/perl Documentation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group