Re: multiline CSV fields

From: Patrick B Kelly <pbk(at)patrickbkelly(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: multiline CSV fields
Date: 2004-11-12 03:35:07
Message-ID: D97EBB68-345B-11D9-B14C-000A958A3956@patrickbkelly.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


On Nov 11, 2004, at 10:07 PM, Andrew Dunstan wrote:

>
>
> Patrick B Kelly wrote:
>
>>
>>
>>
>> My suggestion is to simply have CopyReadLine recognize these two
>> states (in-field and out-of-field) and execute the current logic only
>> while in the second state. It would not be too hard but as you
>> mentioned it is non-trivial.
>>
>>
>>
>
> We don't know what state we expect the end of line to be in until
> after we have actually read the line. To know how to treat the end of
> line on your scheme we would have to parse as we go rather than after
> reading the line as now. Changing this would be not only be
> non-trivial but significantly invasive to the code.
>
>

Perhaps I am misunderstanding the code. As I read it the code currently
goes through the input character by character looking for NL and EOF
characters. It appears to be very well structured for what I am
proposing. The section in question is a small and clearly defined loop
which reads the input one character at a time and decides when it has
reached the end of the line or file. Each call of CopyReadLine attempts
to get one more line. I would propose that each time it starts out in
the out-of-field state and the state is toggled by each un-escaped
quote that it encounters in the stream. When in the in-field state, it
would only look for the next un-escaped quote and while in the
out-of-field state, it would execute the existing logic as well as
looking for the next un-escaped quote.

I may not be explaining myself well or I may fundamentally
misunderstand how copy works. I would be happy to code the change and
send it to you for review, if you would be interested in looking it
over and it is felt to be a worthwhile capability.

Patrick B. Kelly
------------------------------------------------------
http://patrickbkelly.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-11-12 04:01:56 Re: GUC custom variables broken
Previous Message Andrew Dunstan 2004-11-12 03:07:47 Re: multiline CSV fields

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2004-11-12 04:47:06 Re: multiline CSV fields
Previous Message Andrew Dunstan 2004-11-12 03:07:47 Re: multiline CSV fields