Re: Fixing backslash dot for COPY FROM...CSV

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing backslash dot for COPY FROM...CSV
Date: 2024-01-24 16:01:15
Message-ID: 90523252-5831-4541-b3f6-0b3e03d428d2@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:

> Those links unfortunately seem not to be entirely specific to this
> issue. Other, related things seem to be discussed there, and it's not
> obvious that everyone agrees on what to do, or really that anyone
> agrees on what to do. The best link that I found for this exact issue
> is
> https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
> but the thread isn't very conclusive and is so old that any
> conclusions reached then might no longer be considered valid today.

To refresh the problem statement, 4 cases that need fixing as
of HEAD can be distinguished:

#1. copy csv from file, single column, no quoting involved.
COPY will stop at \. and ignore the rest of the file without
any error or warning.

$ cat >/tmp/file.csv <<EOF
line1
\.
line2
EOF

$ psql <<EOF
create table contents(t text);
copy contents from '/tmp/file.csv' (format csv);
table contents;
EOF

Results in
t
-------
line1
(1 row)

The bug is that a single row is imported instead of the 3 rows of the file.

#2. Same as the previous case, but with file_fdw

$ psql <<EOF
CREATE EXTENSION file_fdw;

CREATE FOREIGN TABLE csv_data(line text) SERVER myserver
OPTIONS (filename '/tmp/file.csv', format 'csv');

TABLE csv_data;
EOF

Results in:

line
-------
line1
(1 row)

The bug is that rows 2 and 3 are missing, as in case #1.

#3. \copy csv from file with \. inside a quoted multi-line section

This is the case that the above linked report mentioned,
resulting in a failure to import.
In addition to being legal CSV, these contents can be produced by
Postgres itself exporting in CSV.

$ cat >/tmp/file-quoted.csv <<EOF
line1
"
\.
"
line2
EOF

$ psql <<EOF
create table contents(t text);
\copy contents from '/tmp/file-quoted.csv' csv;
EOF

Results in an error:

ERROR: unterminated CSV quoted field
CONTEXT: COPY contents, line 4: ""
\.
"

The expected result is that it imports 3 rows without error.

#4. \copy csv from file, single column, no quoting involved
This is the same case as #1 except it uses the client-server protocol.

$ cat >/tmp/file.csv <<EOF
line1
\.
line2
EOF

$ psql <<EOF
create table contents(t text);
\copy contents from '/tmp/file.csv' (format csv);
TABLE contents;
EOF

Results in
t
-------
line1
(1 row)

As in case #1, a single row is imported instead of 3 rows.

The proposed patch addresses these cases by making the sequence
\. non-special in CSV (in fact backslash itself is a normal character in
CSV).
It does not address the cases when the data is embedded after
the COPY command or typed interactively in psql, since these cases
require an explicit end-of-data marker, and CSV does not have
the concept of an end-of-data marker.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-24 16:13:13 Re: Support TZ format code in to_timestamp()
Previous Message Jelte Fennema-Nio 2024-01-24 15:58:17 Re: SSL tests fail on OpenSSL v3.2.0