From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-12 19:57:06 |
Message-ID: | e819612f-f75f-ec88-0d0c-d63ffb6c8745@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2023-05-11 Th 10:03, Joel Jacobson wrote:
> Hi hackers,
>
> I've come across an unexpected behavior in our CSV parser that I'd like to
> bring up for discussion.
>
> % cat example.csv
> id,rating,review
> 1,5,"Great product, will buy again."
> 2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet"
>
> % psql
> CREATE TABLE reviews (id int, rating int, review text);
> \COPY reviews FROM example.csv WITH CSV HEADER;
> SELECT * FROM reviews;
>
> This gives:
>
> id | rating | review
> ----+--------+-------------------------------------------------------------
> 1 | 5 | Great product, will buy again.
> 2 | 3 | I bought this for my 6 laptop but it didn't fit my 8 tablet
> (2 rows)
Maybe this is unexpected by you, but it's not by me. What other sane
interpretation of that data could there be? And what CSV producer
outputs such horrible content? As you've noted, ours certainly does not.
Our rules are clear: quotes within quotes must be escaped (default
escape is by doubling the quote char). Allowing partial fields to be
quoted was a deliberate decision when CSV parsing was implemented,
because examples have been seen in the wild.
So I don't think our behaviour is broken or needs fixing. As mentioned
by Greg, this is an example of the adage about being liberal in what you
accept.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathaniel Sabanski | 2023-05-12 20:04:00 | Re: Adding SHOW CREATE TABLE |
Previous Message | Pavel Stehule | 2023-05-12 19:17:00 | Re: psql tests hangs |