Re: backslash-dot quoting in COPY CSV

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: backslash-dot quoting in COPY CSV
Date: 2019-01-30 17:50:59
Message-ID: 20190130175059.GA26141@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 30, 2019 at 06:32:11PM +0100, Daniel Verite wrote:
> Bruce Momjian wrote:
> > Well, these all kind of require a change to the COPY format, which
> > hasn't changed in many years.
>
> Not for the first two. As an example of solution #2, it could look like this:
>
> =# \set INLINE_COPY_BOUNDARY ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> =# COPY table FROM STDIN CSV;
> somevalue,"foo
> \.
> bar"
> ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
>
> Instead of looking for \. on a line by itself, psql would look for the
> boundary to know where the data ends.
> The boundary is not transmitted to the server, it has no need to know
> about it.

Wow, that is an odd API, as you stated below.

> > > - psql could be told somehow that the next piece of inline data is in
> > > the CSV format, and then pass it through a CSV parser.
> >
> > That might be the cleanest solution, but how would we actually input
> > multi-line data in CSV mode with \. alone on a line?
>
> With this solution, the content doesn't change at all.
> The weird part would be the user interface, because the information
> psql needs is not only "CSV", it's also the options DELIMITER, QUOTE,
> ESCAPE and possibly ENCODING. Currently it doesn't know any of these,
> they're passed to the server in an opaque, unparsed form within
> the COPY command.
>
> Personally, the solution I find cleaner is the server side not having
> any end-of-data marker for CSV. So backslash-dot would never be
> special. psql could allow for a custom ending boundary for in-script
> data, and users could set that to backslash-dot if they want, but that
> would be their choice.
> That would be clearly not backward compatible, and I believe it wouldn't
> work with the v2 protocol, so I'm not sure it would have much chance of
> approval.

I had forgotten that the DELIMITER and QUOTE can be changed --- that
kills the idea of adding a simple CSV parser into psql because we would
have to parse the COPY SQL command as well.

I am wondering if we should just disallow CSV from STDIN, on security
grounds. How big a problem would that be for people? Would we have to
disable to STDOUT as well since it could not be restored? Should we
issue some kind of security warning in such cases? Should we document
this?

In hindsight, I am not sure how we could have designed this more
securly. I guess we could have required some special text to start all
CSV continuation lines that were not end-of-file, but that would have
been very unportable, which is the goal of CSV.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-01-30 18:03:18 Re: backslash-dot quoting in COPY CSV
Previous Message Bossart, Nathan 2019-01-30 17:45:58 Re: A few new options for vacuumdb