Re: Finding Errors in .csv Input Data

From: Rich Shepard <rshepard(at)appl-ecosys(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Finding Errors in .csv Input Data
Date: 2011-02-23 02:58:40
Message-ID: alpine.LNX.2.00.1102221853060.32123@salmo.appl-ecosys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 22 Feb 2011, Adrian Klaver wrote:

> We are going to need to see at least a sample of the actual data in
> one.csv that is causing the problem. You have an off by two error as you
> suggest, but that could actually have happened earlier in the row. For
> instance the well_finish_date would insert into lot_no because lot_no is
> TEXT and the date value at this point is just text. Same with
> date_cmplt_acc and block_no.

Adrian/Andy,

The data came out of Access as comma-and-quote csv. I massaged it in emacs
and sed to change the delimiter to a pipe rather than a comma and removed
the double quotes.

I cannot assume that each of the 80 problem rows suffer from the defect in
the same place, so if there's a generic process I can apply it row-by-row.
After all, 80 problem rows out of 110,752 is not bad.

Here are the schema and the first row, in one.csv. Because I don't know if
attachments are stripped off before the message is allowed to be distributed
to the list subscribers, I'll just include both here.

DDL:

The column names are original (except for 'ref' that seems to be a
reserved word), but I modifed the data types.

CREATE TABLE water_well (
sequence_no TEXT PRIMARY KEY,
well_log TEXT,
app VARCHAR(20),
notice_of_intent VARCHAR(6),
waiver_no VARCHAR(30),
date_log_rcvd DATE,
date_log_rcvd_acc CHAR(1),
site_type CHAR(1),
work_type CHAR(1),
work_type_rmks TEXT,
proposed_use CHAR(1),
drilling_method CHAR(1),
sc TEXT,
ha TEXT,
twn VARCHAR(3),
legal_twn VARCHAR(3),
rng VARCHAR(3),
legal_rng VARCHAR(3),
sec TEXT,
sec_quarters TEXT,
legal_quarters TEXT,
quarters_seq TEXT,
ref TEXT,
latitude NUMERIC(9,6),
longitude NUMERIC(9,6),
lat_long_src VARCHAR(5),
lat_long_acc CHAR(1),
owner_current TEXT,
owner_address TEXT,
owner_no TEXT,
parcel_no TEXT,
subdivision_name TEXT,
lot_no TEXT,
block_no TEXT,
well_finish_date DATE,
date_cmplt_acc CHAR(1),
gravel_packed CHAR(1),
depth_seal INTEGER,
depth_drilled INTEGER,
depth_bedrock INTEGER,
aquifer_desc TEXT,
depth_cased INTEGER,
csng_diameter FLOAT,
csng_reductions INTEGER,
top_perf INTEGER,
bottom_perf INTEGER,
perf_intervals INTEGER,
static_wl FLOAT,
temperature FLOAT,
yield FLOAT,
drawdown FLOAT,
hours_pumped FLOAT,
test_method CHAR(1),
qual_const_data CHAR(1),
qual_lith_data CHAR(1),
remarks TEXT,
remarks_additional TEXT,
contractor_lic_no VARCHAR(8),
contractor_name TEXT,
contractor_address TEXT,
contractor_drlr_no VARCHAR(6),
driller_lic_no VARCHAR(6),
source_agency TEXT,
user_id TEXT,
date_entry DATE,
update_user_id VARCHAR(16),
date_update DATE,
edit_status VARCHAR(16),
well_start_date DATE,
gravel_pack_top INTEGER,
gravel_pack_bot INTEGER,
utm_x NUMERIC(13,6),
utm_y NUMERIC(13,6)
);

Here's one.csv:

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23

I hope you're able to see what I keep missing as the source of the
problem.

Rich

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2011-02-23 03:33:13 Re: Mysql to Postgresql
Previous Message Adrian Klaver 2011-02-23 01:45:53 Re: Finding Errors in .csv Input Data