Skip site navigation (1) Skip section navigation (2)

Re: multiline CSV fields

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiline CSV fields
Date: 2004-11-29 03:32:25
Message-ID: 200411290332.iAT3WPX22551@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
OK, what solutions do we have for this?  Not being able to load dumped
data is a serious bug.  I have added this to the open items list:

	* fix COPY CSV with \r,\n in data

My feeling is that if we are in a quoted string we just process whatever
characters we find, even passing through an EOL.  I realize it might not
mark missing quote errors well but that seems minor compared to not
loading valid data.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> This example should fail  on data line 2 or 3 on any platform, 
> regardless of the platform's line-end convention, although I haven't 
> tested on Windows.
> 
> cheers
> 
> andrew
> 
> [andrew(at)aloysius inst]$ bin/psql -e -f csverr.sql ; od -c 
> /tmp/csverrtest.csv
> create table csverrtest (a int, b text, c int);
> CREATE TABLE
> insert into csverrtest values(1,'a',1);
> INSERT 122471 1
> insert into csverrtest values(2,'foo\r\nbar',2);
> INSERT 122472 1
> insert into csverrtest values(3,'baz\nblurfl',3);
> INSERT 122473 1
> insert into csverrtest values(4,'d',4);
> INSERT 122474 1
> insert into csverrtest values(5,'e',5);
> INSERT 122475 1
> copy csverrtest to '/tmp/csverrtest.csv' csv;
> COPY
> truncate csverrtest;
> TRUNCATE TABLE
> copy csverrtest from '/tmp/csverrtest.csv' csv;
> psql:cvserr.sql:9: ERROR:  literal carriage return found in data
> HINT:  Use "\r" to represent carriage return.
> CONTEXT:  COPY csverrtest, line 2: "2,"foo"
> drop table csverrtest;
> DROP TABLE
> 0000000   1   ,   a   ,   1  \n   2   ,   "   f   o   o  \r  \n   b   a
> 0000020   r   "   ,   2  \n   3   ,   "   b   a   z  \n   b   l   u   r
> 0000040   f   l   "   ,   3  \n   4   ,   d   ,   4  \n   5   ,   e   ,
> 0000060   5  \n
> 0000062
> [andrew(at)aloysius inst]$
> 
> Bruce Momjian wrote:
> 
> >Can I see an example of such a failure line?
> >
> >---------------------------------------------------------------------------
> >
> >Andrew Dunstan wrote:
> >  
> >
> >>Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
> >>COPY CSV with fields containing embedded line end chars if the embedded 
> >>sequence isn't the same as those of the file containing the CSV data. In 
> >>that case we error out when reading the data in. This means there are 
> >>cases where we can produce a CSV data file which we can't read in, which 
> >>is not at all pleasant.
> >>
> >>Possible approaches to the problem:
> >>. make it a documented limitation
> >>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that 
> >>relaxes some of the restrictions on inconsistent line endings
> >>. escape embedded line end chars
> >>
> >>The last really isn't an option, because the whole point of CSVs is to 
> >>play with other programs, and my understanding is that those that 
> >>understand multiline fields (e.g. Excel) expect them not to be escaped, 
> >>and do not produce them escaped.
> >>
> >>So right now I'm tossing up in my head between the first two options. Or 
> >>maybe there's another solution I haven't thought of.
> >>
> >>Thoughts?
> >>
> >>cheers
> >>
> >>andrew
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >>
> >>    
> >>
> >
> >  
> >
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

In response to

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2004-11-29 03:33:51
Subject: Re: [HACKERS] Adding Reply-To: <listname> to Lists configuration ...
Previous:From: Marc G. FournierDate: 2004-11-29 03:32:21
Subject: Re: [HACKERS] Adding Reply-To: <listname> to Lists configuration

pgsql-patches by date

Next:From: Neil ConwayDate: 2004-11-29 03:33:55
Subject: Re: BUG #1329: Bug in IF-ELSEIF-ELSE construct
Previous:From: Bruce MomjianDate: 2004-11-29 03:04:12
Subject: Re: Updated Turkish FAQ

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group