Re: multiline CSV fields

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiline CSV fields
Date: 2004-11-12 09:15:40
Message-ID: 41947F3C.2050906@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


This example should fail on data line 2 or 3 on any platform,
regardless of the platform's line-end convention, although I haven't
tested on Windows.

cheers

andrew

[andrew(at)aloysius inst]$ bin/psql -e -f csverr.sql ; od -c
/tmp/csverrtest.csv
create table csverrtest (a int, b text, c int);
CREATE TABLE
insert into csverrtest values(1,'a',1);
INSERT 122471 1
insert into csverrtest values(2,'foo\r\nbar',2);
INSERT 122472 1
insert into csverrtest values(3,'baz\nblurfl',3);
INSERT 122473 1
insert into csverrtest values(4,'d',4);
INSERT 122474 1
insert into csverrtest values(5,'e',5);
INSERT 122475 1
copy csverrtest to '/tmp/csverrtest.csv' csv;
COPY
truncate csverrtest;
TRUNCATE TABLE
copy csverrtest from '/tmp/csverrtest.csv' csv;
psql:cvserr.sql:9: ERROR: literal carriage return found in data
HINT: Use "\r" to represent carriage return.
CONTEXT: COPY csverrtest, line 2: "2,"foo"
drop table csverrtest;
DROP TABLE
0000000 1 , a , 1 \n 2 , " f o o \r \n b a
0000020 r " , 2 \n 3 , " b a z \n b l u r
0000040 f l " , 3 \n 4 , d , 4 \n 5 , e ,
0000060 5 \n
0000062
[andrew(at)aloysius inst]$

Bruce Momjian wrote:

>Can I see an example of such a failure line?
>
>---------------------------------------------------------------------------
>
>Andrew Dunstan wrote:
>
>
>>Darcy Buskermolen has drawn my attention to unfortunate behaviour of
>>COPY CSV with fields containing embedded line end chars if the embedded
>>sequence isn't the same as those of the file containing the CSV data. In
>>that case we error out when reading the data in. This means there are
>>cases where we can produce a CSV data file which we can't read in, which
>>is not at all pleasant.
>>
>>Possible approaches to the problem:
>>. make it a documented limitation
>>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that
>>relaxes some of the restrictions on inconsistent line endings
>>. escape embedded line end chars
>>
>>The last really isn't an option, because the whole point of CSVs is to
>>play with other programs, and my understanding is that those that
>>understand multiline fields (e.g. Excel) expect them not to be escaped,
>>and do not produce them escaped.
>>
>>So right now I'm tossing up in my head between the first two options. Or
>>maybe there's another solution I haven't thought of.
>>
>>Thoughts?
>>
>>cheers
>>
>>andrew
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>>
>>
>>
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Patrick B Kelly 2004-11-12 15:45:50 Re: multiline CSV fields
Previous Message Oleg Bartunov 2004-11-12 05:34:49 Re: ltree PostgreSQL Module

Browse pgsql-patches by date

  From Date Subject
Next Message Patrick B Kelly 2004-11-12 15:45:50 Re: multiline CSV fields
Previous Message Gavin Sherry 2004-11-12 05:38:10 PITR docs enhancements