Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Kirk Wolak" <wolakk(at)gmail(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-18 07:51:13
Message-ID: 7596ab36-6bba-48f8-9fe7-290327747f4f@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 18, 2023, at 08:35, Pavel Stehule wrote:
> Maybe there is another third implementation in Libre Office.
>
> Generally TSV is not well specified, and then the implementations are not consistent.

Thanks Pavel, that was a very interesting case indeed:

Libre Office (tested on Mac) doesn't have a separate TSV format,
but its CSV format allows specifying custom "Field delimiter" and
"String delimiter".

How peculiar, in Libre Office, when trying to write double quotation marks
(using Shift+2 on my keyboard) you actually don't get the normal double
quotation marks, but some special type of Unicode-quoting,
e2 80 9c ("LEFT DOUBLE QUOTATION MARK") and
e2 80 9d ("RIGHT DOUBLE QUOTATION MARK"),
and in the .CSV file you get the normal double quotation marks as
"String delimiter":

a,b,c,d,e
unquoted,“this field is quoted”,this “word” is quoted,"field with , comma",field with tab

So, my "this field is quoted" experiment was exported unquoted since their
quotation marks don't need to be quoted.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Wei Wang (Fujitsu) 2023-05-18 08:53:03 RE: WL_SOCKET_ACCEPT fairness on Windows
Previous Message Richard Guo 2023-05-18 06:47:42 Re: Assert failure of the cross-check for nullingrels