Re: backslash-dot quoting in COPY CSV

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: backslash-dot quoting in COPY CSV
Date: 2019-01-25 03:09:30
Message-ID: 20190125030930.GC22001@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 2, 2019 at 04:58:35PM +0100, Daniel Verite wrote:
> Hi,
>
> The doc on COPY CSV says about the backslash-dot sequence:
>
> To avoid any misinterpretation, a \. data value appearing as a
> lone entry on a line is automatically quoted on output, and on
> input, if quoted, is not interpreted as the end-of-data marker
>
> However this quoting does not happen when \. is already part
> of a quoted field. Example:
>
> COPY (select 'somevalue', E'foo\n\\.\nbar') TO STDOUT CSV;
>
> outputs:
>
> somevalue,"foo
> \.
> bar"
>
> which conforms to the CSV rules, by which we are not allowed
> to replace \. by anything AFAICS.
> The trouble is, when trying to import this back with COPY FROM,
> it will error out at the backslash-dot and not import anything.
> Furthermore, if these data are meant to be embedded into a
> script, it creates a potential risk of SQL injection.
>
> It is a known issue? I haven't found previous discussions on this.
> It looks to me like the ability of backslash-dot to be an end-of-data
> marker should be neutralizable for CSV. When the data is not embedded,
> it's not needed anyway, and when it's embedded, we could surely think
> of alternatives.

I was unable to reproduce the failure here using files:

CREATE TABLE test (x TEXT);
INSERT INTO test VALUES (E'foo\n\\.\nbar');

COPY test TO STDOUT CSV;
"foo
\.
bar"

COPY test TO '/u/postgres/tmp/x' CSV;

COPY test FROM '/u/postgres/tmp/x' CSV;

SELECT * FROM test;
x
-----
foo+
\. +
bar
foo+
\. +
bar

but I am able to see the failure using STDIN:

COPY test FROM STDIN CSV;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
"foo
\.
ERROR: unterminated CSV quoted field
CONTEXT: COPY test, line 1: ""foo

This seems like a bug to me. Looking at the code, psql issues the
prompts for STDIN, but when it sees \. alone on a line, it has no idea
you are in a quoted CSV string, so it thinks the copy is done and sends
the result to the server. I can't see an easy way to fix this. I guess
we could document it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-01-25 04:46:29 Re: PostgreSQL vs SQL/XML Standards
Previous Message Amit Kapila 2019-01-25 02:50:03 Re: WIP: Avoid creation of the free space map for small tables