Re: multiline CSV fields

From: Patrick B Kelly <pbk(at)patrickbkelly(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiline CSV fields
Date: 2004-11-12 02:40:09
Message-ID: 2B41297B-3454-11D9-B14C-000A958A3956@patrickbkelly.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


On Nov 11, 2004, at 6:16 PM, Tom Lane wrote:

> Patrick B Kelly <pbk(at)patrickbkelly(dot)org> writes:
>> What about just coding a FSM into
>> backend/commands/copy.c:CopyReadLine() that does not process any
>> flavor
>> of NL characters when it is inside of a data field?
>
> CopyReadLine has no business tracking that. One reason why not is that
> it is dealing with data not yet converted out of the client's encoding,
> which makes matching to user-specified quote/escape characters
> difficult.
>
> regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>

I appreciate what you are saying about the encoding and you are, of
course, right but CopyReadLine is already processing the NL characters
and it is doing it without considering the context in which they
appear. Unfortunately, the same character(s) are used for two different
purposes in the files in question. Without considering whether they
appear inside or outside of data fields, CopyReadline will mistake one
for the other and cannot correctly do what it is already trying to do
which is break the input file into lines.

My suggestion is to simply have CopyReadLine recognize these two states
(in-field and out-of-field) and execute the current logic only while in
the second state. It would not be too hard but as you mentioned it is
non-trivial.

Patrick B. Kelly
------------------------------------------------------
http://patrickbkelly.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2004-11-12 03:07:47 Re: multiline CSV fields
Previous Message Tom Lane 2004-11-12 00:11:10 Re: GUC custom variables broken

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2004-11-12 03:07:47 Re: multiline CSV fields
Previous Message Tom Lane 2004-11-11 23:16:49 Re: multiline CSV fields