Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-17 21:45:54
Message-ID: 5258a2fb-2590-49ec-a482-815c4e159675@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
> You can use CSV mode pretty reliably for TSV files. The trick is to use a
> quoting char that shouldn't appear, such as E'\x01' as well as setting the
> delimiter to E'\t'. Yes, it's far from obvious.

I've been using that trick myself many times in the past, but thanks to this
deep-dive into this topic, it looks to me like TEXT would be a better format
fit when dealing with unquoted TSV files, or?

OTOH, one would then need to inspect the TSV file doesn't contain \. on an empty
line...

I was about to suggest we perhaps should consider adding a TSV format, that
is like TEXT excluding the PostgreSQL specific things like \. and \N,
but then I tested exporting TSV from Numbers on Mac and Google Sheets,
and I can see there are incompatible differences. Numbers quote fields
that contain double-quote marks, while Google Sheets doesn't.
None of them (unsurpringly) uses midfield quoting though.

Anyone using Excel that could try exporting the following example as CSV/TSV?

CREATE TABLE t (a text, b text, c text, d text);
INSERT INTO t (a, b, c, d)
VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t with a tab');

I agree with you that it's unwise to change something that's been working
great for such a long time, and I agree CSV-files are probably not a problem
per se, but I think you will agree with me TSV-files is a different story,
from a user-friendliness and correctness perspective. Sure, we could just say
"Don't use TSV! Use CSV instead!" in the docs, that would be an improvement
I think, but there is currently nothing on "TSV" in the docs, so users will
google and find all these broken dangerous suggestions on work-arounds.

Thoughts?

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-05-17 21:51:41 Re: issue with meson builds on msys2
Previous Message Kirk Wolak 2023-05-17 21:23:00 Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)