Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-16 17:15:03
Message-ID: 92a9ae94-6884-4860-ab5f-3995c6f2b137@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 16, 2023, at 13:43, Joel Jacobson wrote:
>If we made midfield quoting a CSV error, those users who are currently mistaken
>about their TSV/TEXT files being CSV while also having balanced quotes in their
>data, would encounter an error rather than a silent failure, which I believe
>would be an enhancement.

Furthermore, I think it could be beneficial to add a HINT message for all type
of CSV/TEXT parsing errors, since the precise ERROR messages might just cause
the user to tinker with the options until it works, instead of carefully reading
through the documentation on the various formats.

Perhaps something like this:

HINT: Are you sure the FORMAT matches your input?

Also, the COPY documentation says nothing about TSV, and I know TEXT isn't
exactly TSV, but it's at least much more TSV than CSV, so maybe we should
describe the differences, such as \N. I think the best advise to users would be
to avoid exporting to .TSV and use .CSV instead, since I've noticed e.g.
Google Sheets to replace newlines in fields with blank space when
exporting .TSV, which effectively destroys data.

The first search results for "postgresql tsv" on Google link to postgresql.org
pages, but the COPY docs are not one of them unfortunately.

The first relevant hit is this one:

"Importing a TSV File into Postgres | by Riley Wong" [1]

Sadly, this author has also misunderstood how to properly import a .TSV file,
he got it all wrong, and doesn't understand or at least doesn't mention there
are more differences than just the delimiter:

COPY listings
FROM '/home/ec2-user/list.tsv'
DELIMITER E'\t'
CSV HEADER;

I must confess I have used PostgreSQL for over two decades without having really
understood the detailed differences between TEXT and CSV, until recently.

[1] https://medium.com/@rlwong2/importing-a-tsv-file-into-postgres-364572a004bf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-05-16 18:11:14 Re: Schema variables - new implementation for Postgres 15
Previous Message Jeff Davis 2023-05-16 16:47:51 Re: Order changes in PG16 since ICU introduction