Re: Should CSV parsing be stricter about mid-field quotes?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-11 14:30:05
Message-ID: CAFj8pRCxAxFMbg+hgKcZ0+_+K2dELcgL2gfz8TbHzim6PfMrzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 11. 5. 2023 v 16:04 odesílatel Joel Jacobson <joel(at)compiler(dot)org> napsal:

> Hi hackers,
>
> I've come across an unexpected behavior in our CSV parser that I'd like to
> bring up for discussion.
>
> % cat example.csv
> id,rating,review
> 1,5,"Great product, will buy again."
> 2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet"
>
> % psql
> CREATE TABLE reviews (id int, rating int, review text);
> \COPY reviews FROM example.csv WITH CSV HEADER;
> SELECT * FROM reviews;
>
> This gives:
>
> id | rating | review
> ----+--------+-------------------------------------------------------------
> 1 | 5 | Great product, will buy again.
> 2 | 3 | I bought this for my 6 laptop but it didn't fit my 8 tablet
> (2 rows)
>
> The parser currently accepts quoting within an unquoted field. This can
> lead to
> data misinterpretation when the quote is part of the field data (e.g.,
> for inches, like in the example).
>
> Our CSV output rules quote an entire field or not at all. But the import of
> fields with mid-field quotes might lead to surprising and undetected
> outcomes.
>
> I think we should throw a parsing error for unescaped mid-field quotes,
> and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-field
> quotes are necessary. The error message could suggest this option when it
> encounters an unescaped mid-field quote.
>
> I think the convenience of not having to use an extra option doesn't
> outweigh
> the risk of undetected data integrity issues.
>
> Thoughts?
>

+1

Pavel

> /Joel
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-05-11 15:50:24 Re: v16 regression - wrong query results with LEFT JOINs + join removal
Previous Message Robert Haas 2023-05-11 14:14:54 Re: v16 regression - wrong query results with LEFT JOINs + join removal