From: | Kirk Wolak <wolakk(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-17 22:18:05 |
Message-ID: | CACLU5mSL=YSWnN787FFph-1QT3wqK9x7qcX=gvg4mqWD-4DoGA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 17, 2023 at 5:47 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:
> On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
> > You can use CSV mode pretty reliably for TSV files. The trick is to use a
> > quoting char that shouldn't appear, such as E'\x01' as well as setting
> the
> > delimiter to E'\t'. Yes, it's far from obvious.
>
> I've been using that trick myself many times in the past, but thanks to
> this
> deep-dive into this topic, it looks to me like TEXT would be a better
> format
> fit when dealing with unquoted TSV files, or?
>
> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
> empty
> line...
>
> I was about to suggest we perhaps should consider adding a TSV format, that
> is like TEXT excluding the PostgreSQL specific things like \. and \N,
> but then I tested exporting TSV from Numbers on Mac and Google Sheets,
> and I can see there are incompatible differences. Numbers quote fields
> that contain double-quote marks, while Google Sheets doesn't.
> None of them (unsurpringly) uses midfield quoting though.
>
> Anyone using Excel that could try exporting the following example as
> CSV/TSV?
>
> CREATE TABLE t (a text, b text, c text, d text);
> INSERT INTO t (a, b, c, d)
> VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t
> with a tab');
>
>
Here you go. Not horrible handling. (I use DataGrip so I saved it from
there directly as TSV,
just for an extra datapoint).
FWIW, if you copy/paste in windows, the data, the field with the tab gets
split into another column in Excel.
But saving it as a file, and opening it.
Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
text file, and saving it back)
Kirk...
Attachment | Content-Type | Size |
---|---|---|
t_xlsx_saved_as_tsv..txt | text/plain | 83 bytes |
t_test_excel.csv | text/csv | 86 bytes |
t_test_excel.tsv.txt | text/plain | 83 bytes |
t_test_DataGrip.tsv | application/octet-stream | 79 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Jehan-Guillaume de Rorthais | 2023-05-17 22:35:29 | Re: Memory leak from ExecutorState context? |
Previous Message | Tom Lane | 2023-05-17 22:14:49 | No buildfarm animals are running both typedefs and --with-llvm |