From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Kirk Wolak" <wolakk(at)gmail(dot)com> |
Cc: | "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-18 06:00:28 |
Message-ID: | 777be2db-f201-49d2-961b-0779f0f0d5ac@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> Here you go. Not horrible handling. (I use DataGrip so I saved it from there
> directly as TSV, just for an extra datapoint).
>
> FWIW, if you copy/paste in windows, the data, the field with the tab gets
> split into another column in Excel. But saving it as a file, and opening it.
> Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
> text file, and saving it back)
Very useful, thanks.
Interesting, DataGrip contrary to Excel doesn't quote fields with commas in TSV.
All the DataGrip/Excel TSV variants uses quoting when necessary,
contrary to Google Sheets's TSV-format, that doesn't quote fields at all.
DataGrip/Excel terminate also the last record with newline,
while Google Sheets omit the newline for the last record,
(which is bad, since then a streaming reader wouldn't know
if the last record is completed or not.)
This makes me think we probably shouldn't add a new TSV format,
since there is no consistency between vendors.
It's impossible to deduce with certainty if a TSV-field that
begins with a double quotation mark is quoted or unquoted.
Two alternative ideas:
1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
with `COPY ... WITH CSV`?
Internally, it would just set
quotec = '\0';`
so it would't affect performance at all.
2. How about adding a note on the complexities of dealing with TSV files in the
COPY documentation?
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2023-05-18 06:19:24 | Re: Should CSV parsing be stricter about mid-field quotes? |
Previous Message | Bharath Rupireddy | 2023-05-18 05:48:25 | Re: WAL Insertion Lock Improvements |