Re: multiline CSV fields

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiline CSV fields
Date: 2004-11-29 03:32:25
Message-ID: 200411290332.iAT3WPX22551@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


OK, what solutions do we have for this? Not being able to load dumped
data is a serious bug. I have added this to the open items list:

* fix COPY CSV with \r,\n in data

My feeling is that if we are in a quoted string we just process whatever
characters we find, even passing through an EOL. I realize it might not
mark missing quote errors well but that seems minor compared to not
loading valid data.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
> This example should fail on data line 2 or 3 on any platform,
> regardless of the platform's line-end convention, although I haven't
> tested on Windows.
>
> cheers
>
> andrew
>
> [andrew(at)aloysius inst]$ bin/psql -e -f csverr.sql ; od -c
> /tmp/csverrtest.csv
> create table csverrtest (a int, b text, c int);
> CREATE TABLE
> insert into csverrtest values(1,'a',1);
> INSERT 122471 1
> insert into csverrtest values(2,'foo\r\nbar',2);
> INSERT 122472 1
> insert into csverrtest values(3,'baz\nblurfl',3);
> INSERT 122473 1
> insert into csverrtest values(4,'d',4);
> INSERT 122474 1
> insert into csverrtest values(5,'e',5);
> INSERT 122475 1
> copy csverrtest to '/tmp/csverrtest.csv' csv;
> COPY
> truncate csverrtest;
> TRUNCATE TABLE
> copy csverrtest from '/tmp/csverrtest.csv' csv;
> psql:cvserr.sql:9: ERROR: literal carriage return found in data
> HINT: Use "\r" to represent carriage return.
> CONTEXT: COPY csverrtest, line 2: "2,"foo"
> drop table csverrtest;
> DROP TABLE
> 0000000 1 , a , 1 \n 2 , " f o o \r \n b a
> 0000020 r " , 2 \n 3 , " b a z \n b l u r
> 0000040 f l " , 3 \n 4 , d , 4 \n 5 , e ,
> 0000060 5 \n
> 0000062
> [andrew(at)aloysius inst]$
>
> Bruce Momjian wrote:
>
> >Can I see an example of such a failure line?
> >
> >---------------------------------------------------------------------------
> >
> >Andrew Dunstan wrote:
> >
> >
> >>Darcy Buskermolen has drawn my attention to unfortunate behaviour of
> >>COPY CSV with fields containing embedded line end chars if the embedded
> >>sequence isn't the same as those of the file containing the CSV data. In
> >>that case we error out when reading the data in. This means there are
> >>cases where we can produce a CSV data file which we can't read in, which
> >>is not at all pleasant.
> >>
> >>Possible approaches to the problem:
> >>. make it a documented limitation
> >>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that
> >>relaxes some of the restrictions on inconsistent line endings
> >>. escape embedded line end chars
> >>
> >>The last really isn't an option, because the whole point of CSVs is to
> >>play with other programs, and my understanding is that those that
> >>understand multiline fields (e.g. Excel) expect them not to be escaped,
> >>and do not produce them escaped.
> >>
> >>So right now I'm tossing up in my head between the first two options. Or
> >>maybe there's another solution I haven't thought of.
> >>
> >>Thoughts?
> >>
> >>cheers
> >>
> >>andrew
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >>
> >>
> >>
> >
> >
> >
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-11-29 03:33:51 Re: [HACKERS] Adding Reply-To: <listname> to Lists configuration ...
Previous Message Marc G. Fournier 2004-11-29 03:32:21 Re: [HACKERS] Adding Reply-To: <listname> to Lists configuration

Browse pgsql-patches by date

  From Date Subject
Next Message Neil Conway 2004-11-29 03:33:55 Re: BUG #1329: Bug in IF-ELSEIF-ELSE construct
Previous Message Bruce Momjian 2004-11-29 03:04:12 Re: Updated Turkish FAQ