From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-11 14:30:05 |
Message-ID: | CAFj8pRCxAxFMbg+hgKcZ0+_+K2dELcgL2gfz8TbHzim6PfMrzQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
čt 11. 5. 2023 v 16:04 odesílatel Joel Jacobson <joel(at)compiler(dot)org> napsal:
> Hi hackers,
>
> I've come across an unexpected behavior in our CSV parser that I'd like to
> bring up for discussion.
>
> % cat example.csv
> id,rating,review
> 1,5,"Great product, will buy again."
> 2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet"
>
> % psql
> CREATE TABLE reviews (id int, rating int, review text);
> \COPY reviews FROM example.csv WITH CSV HEADER;
> SELECT * FROM reviews;
>
> This gives:
>
> id | rating | review
> ----+--------+-------------------------------------------------------------
> 1 | 5 | Great product, will buy again.
> 2 | 3 | I bought this for my 6 laptop but it didn't fit my 8 tablet
> (2 rows)
>
> The parser currently accepts quoting within an unquoted field. This can
> lead to
> data misinterpretation when the quote is part of the field data (e.g.,
> for inches, like in the example).
>
> Our CSV output rules quote an entire field or not at all. But the import of
> fields with mid-field quotes might lead to surprising and undetected
> outcomes.
>
> I think we should throw a parsing error for unescaped mid-field quotes,
> and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-field
> quotes are necessary. The error message could suggest this option when it
> encounters an unescaped mid-field quote.
>
> I think the convenience of not having to use an extra option doesn't
> outweigh
> the risk of undetected data integrity issues.
>
> Thoughts?
>
+1
Pavel
> /Joel
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-05-11 15:50:24 | Re: v16 regression - wrong query results with LEFT JOINs + join removal |
Previous Message | Robert Haas | 2023-05-11 14:14:54 | Re: v16 regression - wrong query results with LEFT JOINs + join removal |