Re: Fixing backslash dot for COPY FROM...CSV

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing backslash dot for COPY FROM...CSV
Date: 2024-04-05 12:58:39
Message-ID: 918e339c-c253-4366-847b-96f926ba46a0@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> I've looked over this patch and I generally agree that this is a
> reasonable solution.

Thanks for reviewing this!

> I'm also wondering why the patch adds a test for
> "PQprotocolVersion(conn) >= 3" in handleCopyIn.

I've removed this in the attached update.

> I concur with Robert's doubts about some of the doc changes though.
> In particular, since we're not changing the behavior for non-CSV
> mode, we shouldn't remove any statements that apply to non-CSV mode.

I don't quite understand the issues with the doc changes. Let me
detail the changes.

The first change is under
<refsect2>
<title>CSV Format</title>
so it does no concern non-CSV modes.

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -761,11 +761,7 @@ COPY <replaceable class="parameter">count</replaceable>
format, <literal>\.</literal>, the end-of-data marker, could also appear
as a data value. To avoid any misinterpretation, a <literal>\.</literal>
data value appearing as a lone entry on a line is automatically
- quoted on output, and on input, if quoted, is not interpreted as the
- end-of-data marker. If you are loading a file created by another
- application that has a single unquoted column and might have a
- value of <literal>\.</literal>, you might need to quote that value in
the
- input file.
+ quoted on output.
</para>

The part about quoting output is kept because the code still does that.

About this bit:
"and on input, if quoted, is not interpreted as the end-of-data marker."

So the patch removes that part. The reason is \. is now not interpreted as
EOD in both cases, quoted or unquoted, conforming to spec.
Previously, what the reader should have understood by "if quoted, ..."
is that it implies "if not quoted, then .\ will be interpreted as EOD
even though that behavior does not conform to the CSV spec".
If we documented the new behavior, it would be something like:
when quoted, it works as expected, and when unquoted, it works as
expected too. Isn't it simpler not to say anything?

About the next sentence:
"If you are loading a file created by another application
that has a single unquoted column and might have a value of \., you
might need to quote that value in the input file."

It's removed as well because it's not longer necessary
to do that.

The other hunk is in psql doc:

--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value'
'second value' \g
destination, because all data must pass through the client/server
connection. For large amounts of data the <acronym>SQL</acronym>
command might be preferable.
- Also, because of this pass-through method, <literal>\copy
- ... from</literal> in <acronym>CSV</acronym> mode will erroneously
- treat a <literal>\.</literal> data value alone on a line as an
- end-of-input marker.

That behavior no longer happens, so this gets removed as well.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachment Content-Type Size
v4-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch text/plain 8.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-04-05 12:58:44 Re: Popcount optimization using AVX512
Previous Message Dagfinn Ilmari Mannsåker 2024-04-05 12:57:04 Re: IPC::Run::time[r|out] vs our TAP tests