Re: Finding Errors in .csv Input Data

From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org, Rich Shepard <rshepard(at)appl-ecosys(dot)com>
Subject: Re: Finding Errors in .csv Input Data
Date: 2011-02-23 01:45:53
Message-ID: 201102221745.54241.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tuesday, February 22, 2011 5:10:34 pm Rich Shepard wrote:
> I'm sure many of you have solved this problem in the past and can offer
> solutions that will work for me. The context is a 73-column postgres table
> of data that was originally in an Access .mdb file. A colleague loaded the
> file into Access and wrote a .csv file for me to use since we have nothing
> Microsoft here. There are 110,752 rows in the file/table. After a lot of
> cleaning with emacs and sed, the copy command accepted all but 80 rows of
> data. Now I need to figure out why postgres reports them as having too many
> columns.
>
> Starting to work with a single row, I first compared by cleaned row to
> the raw .csv from the Access output. They match column-by-column. Then I
> copied the schema to a text file and started comparing the .csv data
> column-by-column. While this looks OK to me, postgres doesn't like it. For
> example, I get this error message:
>
> nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
> ERROR: value too long for type character(1)
> CONTEXT: COPY water_well, line 1, column gravel_packed: "106"
>
> Yet, the column comparison for gravel_packed and surrounding attributes
> does not show this:
>
> lot_no TEXT, |
> block_no TEXT, |
> well_finish_date DATE, 11/15/1948|
> date_cmplt_acc CHAR(1), D|
> gravel_packed CHAR(1), |
> depth_seal INTEGER, |
> depth_drilled INTEGER, 106|
> depth_bedrock INTEGER, |
> aquifer_desc TEXT, |
>
> Notice the NULL value for gravel_packed, while the "106" is for
> depth_drilled, two columns later.
>
> I'm stymied and wonder if there's a tool I can use to fix these 80 rows
> so the copy command will accept them.
>
> Rich

We are going to need to see at least a sample of the actual data in one.csv that
is causing the problem. You have an off by two error as you suggest, but that
could actually have happened earlier in the row. For instance the
well_finish_date would insert into lot_no because lot_no is TEXT and the date
value at this point is just text. Same with date_cmplt_acc and block_no.
--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2011-02-23 02:58:40 Re: Finding Errors in .csv Input Data
Previous Message Andy Colson 2011-02-23 01:37:09 Re: Finding Errors in .csv Input Data