Question, re: Running enormous batch file from the command line

From: "Milarsky, Jeremy F(dot)" <milarskyj(at)missouri(dot)edu>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Question, re: Running enormous batch file from the command line
Date: 2008-11-02 20:13:34
Message-ID: D13DC547E73EF94896EAD4BCA84173BF04AB0E62@UM-XMAIL05.um.umsystem.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am attempting to move an unusually large legacy database from old DBF
files into a Postgres database. I've been able to clean up the DBF
tables, export them to CSV and convert that CSV into a SQL batch file.

My batch file contains 10 million INSERT queries to populate the
corresponding table in Postgres.

When I run it, I see only three errors, all of them having to do with my
admitted failure to escape single quotes in one of my fields when I
parsed the CSV.

[my_command_line_prompt]$ psql mydatabase -f mybatchfile.sql > log.txt

Password:

psql:mybatchfile.sql:2279459: ERROR: syntax error at or near "P" at
character 68

psql:mybatchfile.sql:2279459: LINE 1: ...VALUES
('XXXXXXXXX','SMITH','','JOHN',''','P','09','3...

psql:mybatchfile.sql:2279459:
^

psql:mybatchfile.sql:3117024: ERROR: syntax error at or near "','" at
character 64

psql:mybatchfile.sql:3117024: LINE 1: ...mf1 VALUES
('XXXXXXXXX','SMITH','','JOHN',''',' ','10','28...

psql:mybatchfile.sql:3117024:
^

psql:mybatchfile.sql:6775763: ERROR: syntax error at or near "P" at
character 67

psql:mybatchfile.sql:6775763: LINE 1: ... VALUES
('XXXXXXXXXX','SMITH','','JOHN',''','P','01','0...

psql:mybatchfile.sql:6775763:

However, when afterwards I go into the psql prompt and do a SELECT
COUNT(*) ... on the table, I see there are only 4.5 million or so
records in the table. Appropriately enough, the log.txt file I'm
outputting to in the aforementioned command has 4.5 million or so lines,
each saying "INSERT 0 1".

So is Postgres skipping over some 5+ million lines in my batch file? If
so, why isn't it spitting only 3 error messages out? How else can I see
what's going wrong with those 5 million INSERT commands? Are there any
other factors which would cause this sort of data loss?

I have, for the record, attempting to get all 10 million rows in with a
single COPY command. This method has not been successful, apparently due
to client encoding (despite executing several different "set
client_encoding" in each attempt).

If this sort of question has been asked before, I apologize - but I did
search the mailing list archive before posting.

Thanks in advance.

======

Jeremy Milarsky

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2008-11-02 20:45:09 Re: Question, re: Running enormous batch file from the command line
Previous Message Joris Dobbelsteen 2008-11-02 19:37:45 Re: Are there plans to add data compression feature to postgresql?