Re: Should CSV parsing be stricter about mid-field quotes?

From: Kirk Wolak <wolakk(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-17 22:18:05
Message-ID: CACLU5mSL=YSWnN787FFph-1QT3wqK9x7qcX=gvg4mqWD-4DoGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 17, 2023 at 5:47 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:

> On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
> > You can use CSV mode pretty reliably for TSV files. The trick is to use a
> > quoting char that shouldn't appear, such as E'\x01' as well as setting
> the
> > delimiter to E'\t'. Yes, it's far from obvious.
>
> I've been using that trick myself many times in the past, but thanks to
> this
> deep-dive into this topic, it looks to me like TEXT would be a better
> format
> fit when dealing with unquoted TSV files, or?
>
> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
> empty
> line...
>
> I was about to suggest we perhaps should consider adding a TSV format, that
> is like TEXT excluding the PostgreSQL specific things like \. and \N,
> but then I tested exporting TSV from Numbers on Mac and Google Sheets,
> and I can see there are incompatible differences. Numbers quote fields
> that contain double-quote marks, while Google Sheets doesn't.
> None of them (unsurpringly) uses midfield quoting though.
>
> Anyone using Excel that could try exporting the following example as
> CSV/TSV?
>
> CREATE TABLE t (a text, b text, c text, d text);
> INSERT INTO t (a, b, c, d)
> VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t
> with a tab');
>
>
Here you go. Not horrible handling. (I use DataGrip so I saved it from
there directly as TSV,
just for an extra datapoint).

FWIW, if you copy/paste in windows, the data, the field with the tab gets
split into another column in Excel.
But saving it as a file, and opening it.
Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
text file, and saving it back)

Kirk...

Attachment Content-Type Size
t_xlsx_saved_as_tsv..txt text/plain 83 bytes
t_test_excel.csv text/csv 86 bytes
t_test_excel.tsv.txt text/plain 83 bytes
t_test_DataGrip.tsv application/octet-stream 79 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2023-05-17 22:35:29 Re: Memory leak from ExecutorState context?
Previous Message Tom Lane 2023-05-17 22:14:49 No buildfarm animals are running both typedefs and --with-llvm