| From: | Bruce Momjian <bruce(at)momjian(dot)us> | 
|---|---|
| To: | Daniel Verite <daniel(at)manitou-mail(dot)org> | 
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: backslash-dot quoting in COPY CSV | 
| Date: | 2019-01-25 03:09:30 | 
| Message-ID: | 20190125030930.GC22001@momjian.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Wed, Jan  2, 2019 at 04:58:35PM +0100, Daniel Verite wrote:
>  Hi,
> 
> The doc on COPY CSV says about the backslash-dot sequence:
> 
>   To avoid any misinterpretation, a \. data value appearing as a
>   lone entry on a line is automatically quoted on output, and on
>   input, if quoted, is not interpreted as the end-of-data marker
> 
> However this quoting does not happen when \. is already part
> of a quoted field. Example:
> 
> COPY (select 'somevalue', E'foo\n\\.\nbar') TO STDOUT CSV;
> 
> outputs:
> 
> somevalue,"foo
> \.
> bar"
> 
> which conforms to the CSV rules, by which we are not allowed
> to replace \. by anything AFAICS.
> The trouble is, when trying to import this back with COPY FROM,
> it will error out at the backslash-dot and not import anything.
> Furthermore, if these data are meant to be embedded into a
> script, it creates a potential risk of SQL injection.
> 
> It is a known issue? I haven't found previous discussions on this.
> It looks to me like the ability of backslash-dot to be an end-of-data
> marker should be neutralizable for CSV. When the data is not embedded,
> it's not needed anyway, and when it's embedded, we could surely think
> of alternatives.
I was unable to reproduce the failure here using files:
	CREATE TABLE test (x TEXT);
	INSERT INTO test VALUES (E'foo\n\\.\nbar');
	COPY test TO STDOUT CSV;
	"foo
	\.
	bar"
	COPY test TO '/u/postgres/tmp/x' CSV;
	
	COPY test FROM '/u/postgres/tmp/x' CSV;
	SELECT * FROM test;
	  x
	-----
	 foo+
	 \. +
	 bar
	 foo+
	 \. +
	 bar
but I am able to see the failure using STDIN:
	COPY test FROM STDIN CSV;
	Enter data to be copied followed by a newline.
	End with a backslash and a period on a line by itself, or an EOF signal.
	"foo
	\.
	ERROR:  unterminated CSV quoted field
	CONTEXT:  COPY test, line 1: ""foo
This seems like a bug to me.  Looking at the code, psql issues the
prompts for STDIN, but when it sees \. alone on a line, it has no idea
you are in a quoted CSV string, so it thinks the copy is done and sends
the result to the server.  I can't see an easy way to fix this.  I guess
we could document it.
-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chapman Flack | 2019-01-25 04:46:29 | Re: PostgreSQL vs SQL/XML Standards | 
| Previous Message | Amit Kapila | 2019-01-25 02:50:03 | Re: WIP: Avoid creation of the free space map for small tables |