Re: multiline CSV fields

From: Patrick B Kelly <pbk(at)patrickbkelly(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: multiline CSV fields
Date: 2004-11-11 23:03:58
Message-ID: F82E5F5D-3435-11D9-B14C-000A958A3956@patrickbkelly.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


On Nov 11, 2004, at 2:56 PM, Andrew Dunstan wrote:

>
>
> Tom Lane wrote:
>
>> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>>
>>> Patrick B Kelly wrote:
>>>
>>>> Actually, when I try to export a sheet with multi-line cells from
>>>> excel, it tells me that this feature is incompatible with the CSV
>>>> format and will not include them in the CSV file.
>>>>
>>
>>
>>> It probably depends on the version. I have just tested with Excel
>>> 2000 on a WinXP machine and it both read and wrote these files.
>>>
>>
>> I'd be inclined to define Excel 2000 as broken, honestly, if it's
>> writing unescaped newlines as data. To support this would mean
>> throwing
>> away most of our ability to detect incorrectly formatted CSV files.
>> A simple error like a missing close quote would look to the machine
>> like
>> the rest of the file is a single long data line where all the newlines
>> are embedded in data fields. How likely is it that you'll get a
>> useful
>> error message out of that? Most likely the error message would point
>> to
>> the end of the file, or at least someplace well removed from the
>> actual
>> mistake.
>>
>> I would vote in favor of removing the current code that attempts to
>> support unquoted newlines, and waiting to see if there are complaints.
>>
>>
>>
>
> This feature was specifically requested when we discussed what sort of
> CSVs we would handle.
>
> And it does in fact work as long as the newline style is the same.
>
> I just had an idea. How about if we add a new CSV option MULTILINE. If
> absent, then on output we would not output unescaped LF/CR characters
> and on input we would not allow fields with embedded unescaped LF/CR
> characters. In both cases we could error out for now, with perhaps an
> 8.1 TODO to provide some other behaviour.
>
> Or we could drop the whole multiline "feature" for now and make the
> whole thing an 8.1 item, although it would be a bit of a pity when it
> does work in what will surely be the most common case.
>

What about just coding a FSM into
backend/commands/copy.c:CopyReadLine() that does not process any flavor
of NL characters when it is inside of a data field?

Patrick B. Kelly
------------------------------------------------------
http://patrickbkelly.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Hallgren 2004-11-11 23:14:46 GUC custom variables broken
Previous Message Tom Lane 2004-11-11 22:46:17 Re: MAX/MIN optimization via rewrite (plus query rewrites generally)

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2004-11-11 23:15:15 Re: multiline CSV fields
Previous Message Tom Lane 2004-11-11 22:39:16 Re: Proposed patch to remove USERLIMIT