Re: multiline CSV fields

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multiline CSV fields
Date: 2004-11-30 19:11:28
Message-ID: 41ACC5E0.2040507@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:

>I am wondering if one good solution would be to pre-process the input
>stream in copy.c to convert newline to \n and carriage return to \r and
>double data backslashes and tell copy.c to interpret those like it does
>for normal text COPY files. That way, the changes to copy.c might be
>minimal; basically, place a filter in front of the CSV file that cleans
>up the input so it can be more easily processed.
>
>

This would have to parse the input stream, because you would need to
know which CRs and LFs were part of the data stream and so should be
escaped, and which really ended data lines and so should be left alone.
However, while the idea is basically sound, parsing the stream twice
seems crazy. My argument has been that at this stage in the dev cycle we
should document the limitation, maybe issue a warning as you want, and
make the more invasive code changes to fix it properly in 8.1. If you
don't want to wait, then following your train of thought a bit, ISTM
that the correct solution is a routine for CSV mode that combines the
functions of CopyReadAttributeCSV() and CopyReadLine(). Then we'd have a
genuine and fast fix for Greg's and Darcy's problem.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-11-30 19:32:05 Re: Increasing the length of
Previous Message Bruce Momjian 2004-11-30 19:00:59 Re: [HACKERS] psql \e broken again

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2004-11-30 19:34:06 Re: multiline CSV fields
Previous Message Bruce Momjian 2004-11-30 19:00:59 Re: [HACKERS] psql \e broken again