From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | Kirk Wolak <wolakk(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-18 06:35:26 |
Message-ID: | CAFj8pRBPPfmL+xhBmZha+OAyJO2zXj+28RFPJdd2wS2+pfZc_Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
čt 18. 5. 2023 v 8:01 odesílatel Joel Jacobson <joel(at)compiler(dot)org> napsal:
> On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> > Here you go. Not horrible handling. (I use DataGrip so I saved it from
> there
> > directly as TSV, just for an extra datapoint).
> >
> > FWIW, if you copy/paste in windows, the data, the field with the tab gets
> > split into another column in Excel. But saving it as a file, and opening
> it.
> > Saving it as XLSX, and then having Excel save it as a TSV (versus
> opening a
> > text file, and saving it back)
>
> Very useful, thanks.
>
> Interesting, DataGrip contrary to Excel doesn't quote fields with commas
> in TSV.
> All the DataGrip/Excel TSV variants uses quoting when necessary,
> contrary to Google Sheets's TSV-format, that doesn't quote fields at all.
>
Maybe there is another third implementation in Libre Office.
Generally TSV is not well specified, and then the implementations are not
consistent.
>
> DataGrip/Excel terminate also the last record with newline,
> while Google Sheets omit the newline for the last record,
> (which is bad, since then a streaming reader wouldn't know
> if the last record is completed or not.)
>
> This makes me think we probably shouldn't add a new TSV format,
> since there is no consistency between vendors.
> It's impossible to deduce with certainty if a TSV-field that
> begins with a double quotation mark is quoted or unquoted.
>
> Two alternative ideas:
>
> 1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
> with `COPY ... WITH CSV`?
>
> Internally, it would just set
>
> quotec = '\0';`
>
> so it would't affect performance at all.
>
> 2. How about adding a note on the complexities of dealing with TSV files
> in the
> COPY documentation?
>
> /Joel
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2023-05-18 06:37:43 | Re: Assert failure of the cross-check for nullingrels |
Previous Message | Joel Jacobson | 2023-05-18 06:19:24 | Re: Should CSV parsing be stricter about mid-field quotes? |