Re: Should CSV parsing be stricter about mid-field quotes?

From: Noah Misch <noah(at)leadboat(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-07-02 05:45:31
Message-ID: 20230702054531.GA1230904@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 20, 2023 at 09:16:30AM +0200, Joel Jacobson wrote:
> On Fri, May 19, 2023, at 18:06, Daniel Verite wrote:
> > COPY FROM file CSV somewhat differs as your example shows,
> > but it still mishandle \. when unquoted. For instance, consider this
> > file to load with COPY t FROM '/tmp/t.csv' WITH CSV
> > $ cat /tmp/t.csv
> > line 1
> > \.
> > line 3
> > line 4
> >
> > It results in having only "line 1" being imported.
>
> Hmm, this is a problem for one of the new use-cases I brought up that would be
> possible with DELIMITER NONE QUOTE NONE, i.e. to import unstructured log files,
> where each raw line should be imported "as is" into a single text column.
>
> Is there a valid reason why \. is needed for COPY FROM filename?

No.

> It seems to me it would only be necessary for the COPY FROM STDIN case,
> since files have a natural end-of-file and a known file size.

Right. Even for COPY FROM STDIN, it's not needed anymore since the removal of
protocol v2. psql would still use it to find the end of inline COPY data,
though. Here's another relevant thread:
https://postgr.es/m/flat/bfcd57e4-8f23-4c3e-a5db-2571d09208e2%40beta.fastmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-07-02 06:09:49 Re: Fdw batch insert error out when set batch_size > 65535
Previous Message Miroslav Bendik 2023-07-02 04:02:08 Re: Incremental sort for access method with ordered scan support (amcanorderbyop)