From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-17 21:45:54 |
Message-ID: | 5258a2fb-2590-49ec-a482-815c4e159675@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
> You can use CSV mode pretty reliably for TSV files. The trick is to use a
> quoting char that shouldn't appear, such as E'\x01' as well as setting the
> delimiter to E'\t'. Yes, it's far from obvious.
I've been using that trick myself many times in the past, but thanks to this
deep-dive into this topic, it looks to me like TEXT would be a better format
fit when dealing with unquoted TSV files, or?
OTOH, one would then need to inspect the TSV file doesn't contain \. on an empty
line...
I was about to suggest we perhaps should consider adding a TSV format, that
is like TEXT excluding the PostgreSQL specific things like \. and \N,
but then I tested exporting TSV from Numbers on Mac and Google Sheets,
and I can see there are incompatible differences. Numbers quote fields
that contain double-quote marks, while Google Sheets doesn't.
None of them (unsurpringly) uses midfield quoting though.
Anyone using Excel that could try exporting the following example as CSV/TSV?
CREATE TABLE t (a text, b text, c text, d text);
INSERT INTO t (a, b, c, d)
VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t with a tab');
I agree with you that it's unwise to change something that's been working
great for such a long time, and I agree CSV-files are probably not a problem
per se, but I think you will agree with me TSV-files is a different story,
from a user-friendliness and correctness perspective. Sure, we could just say
"Don't use TSV! Use CSV instead!" in the docs, that would be an improvement
I think, but there is currently nothing on "TSV" in the docs, so users will
google and find all these broken dangerous suggestions on work-arounds.
Thoughts?
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2023-05-17 21:51:41 | Re: issue with meson builds on msys2 |
Previous Message | Kirk Wolak | 2023-05-17 21:23:00 | Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN) |