Re: backslash-dot quoting in COPY CSV

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: backslash-dot quoting in COPY CSV
Date: 2019-01-30 18:03:18
Message-ID: CAFj8pRB00Tymx+UO9vKuZG2-JvWAKjwXv63oyegsFTYZGSShOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

st 30. 1. 2019 18:51 odesílatel Bruce Momjian <bruce(at)momjian(dot)us> napsal:

> On Wed, Jan 30, 2019 at 06:32:11PM +0100, Daniel Verite wrote:
> > Bruce Momjian wrote:
> > > Well, these all kind of require a change to the COPY format, which
> > > hasn't changed in many years.
> >
> > Not for the first two. As an example of solution #2, it could look like
> this:
> >
> > =# \set INLINE_COPY_BOUNDARY ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> > =# COPY table FROM STDIN CSV;
> > somevalue,"foo
> > \.
> > bar"
> > ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> >
> > Instead of looking for \. on a line by itself, psql would look for the
> > boundary to know where the data ends.
> > The boundary is not transmitted to the server, it has no need to know
> > about it.
>
> Wow, that is an odd API, as you stated below.
>
> > > > - psql could be told somehow that the next piece of inline data is in
> > > > the CSV format, and then pass it through a CSV parser.
> > >
> > > That might be the cleanest solution, but how would we actually input
> > > multi-line data in CSV mode with \. alone on a line?
> >
> > With this solution, the content doesn't change at all.
> > The weird part would be the user interface, because the information
> > psql needs is not only "CSV", it's also the options DELIMITER, QUOTE,
> > ESCAPE and possibly ENCODING. Currently it doesn't know any of these,
> > they're passed to the server in an opaque, unparsed form within
> > the COPY command.
> >
> > Personally, the solution I find cleaner is the server side not having
> > any end-of-data marker for CSV. So backslash-dot would never be
> > special. psql could allow for a custom ending boundary for in-script
> > data, and users could set that to backslash-dot if they want, but that
> > would be their choice.
> > That would be clearly not backward compatible, and I believe it wouldn't
> > work with the v2 protocol, so I'm not sure it would have much chance of
> > approval.
>
> I had forgotten that the DELIMITER and QUOTE can be changed --- that
> kills the idea of adding a simple CSV parser into psql because we would
> have to parse the COPY SQL command as well.
>
> I am wondering if we should just disallow CSV from STDIN, on security
> grounds. How big a problem would that be for people? Would we have to
> disable to STDOUT as well since it could not be restored? Should we
> issue some kind of security warning in such cases? Should we document
> this?
>

it is pretty common pattern for etl, copy from stdin. I am thinking it can
be big problem

> In hindsight, I am not sure how we could have designed this more
> securly. I guess we could have required some special text to start all
> CSV continuation lines that were not end-of-file, but that would have
> been very unportable, which is the goal of CSV.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Ancient Roman grave inscription +
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-30 18:20:59 Re: backslash-dot quoting in COPY CSV
Previous Message Bruce Momjian 2019-01-30 17:50:59 Re: backslash-dot quoting in COPY CSV