Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Kirk Wolak" <wolakk(at)gmail(dot)com>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-18 06:00:28
Message-ID: 777be2db-f201-49d2-961b-0779f0f0d5ac@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> Here you go. Not horrible handling. (I use DataGrip so I saved it from there
> directly as TSV, just for an extra datapoint).
>
> FWIW, if you copy/paste in windows, the data, the field with the tab gets
> split into another column in Excel. But saving it as a file, and opening it.
> Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
> text file, and saving it back)

Very useful, thanks.

Interesting, DataGrip contrary to Excel doesn't quote fields with commas in TSV.
All the DataGrip/Excel TSV variants uses quoting when necessary,
contrary to Google Sheets's TSV-format, that doesn't quote fields at all.

DataGrip/Excel terminate also the last record with newline,
while Google Sheets omit the newline for the last record,
(which is bad, since then a streaming reader wouldn't know
if the last record is completed or not.)

This makes me think we probably shouldn't add a new TSV format,
since there is no consistency between vendors.
It's impossible to deduce with certainty if a TSV-field that
begins with a double quotation mark is quoted or unquoted.

Two alternative ideas:

1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
with `COPY ... WITH CSV`?

Internally, it would just set

quotec = '\0';`

so it would't affect performance at all.

2. How about adding a note on the complexities of dealing with TSV files in the
COPY documentation?

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2023-05-18 06:19:24 Re: Should CSV parsing be stricter about mid-field quotes?
Previous Message Bharath Rupireddy 2023-05-18 05:48:25 Re: WAL Insertion Lock Improvements