Re: raw output from copy

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Golub <pavel(at)microolap(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: raw output from copy
Date: 2016-03-29 17:28:13
Message-ID: CAFj8pRBipcyx5MaqFX1aBR_ruvo0JiHdbd+GxL=98DGvuyFx5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2016-03-29 18:19 GMT+02:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I tested COPY RAW on old psql clients - and it is working without any
> > problem - so when the client uses same logic as psql, then it should to
> > work. Sure, there can be differently implemented clients, but the COPY
> > client side is usually simple - store stream to output.
>
> My point is precisely that I doubt all clients are that stupid about COPY.
>
> > Maybe I am blind, but I don't see any new security risks. The risk can be
> > only on client side - and if client is not able work with new value, then
> > it can fails.
>
> Well, the point is that low-level code might get used to process the data
> stream for commands it doesn't have any control over. Maybe there's no
> realistic security risk there, or maybe there is; I'm not sure.
>
> > I am thinking so PQbinaryTuples should to return 1 (without change), and
> > PQfformat should to return 2.
>
> Well, that seems pretty backwards to me. The format of the individual
> fields is still what it is under COPY BINARY; you would not use a
> different per-field transformation. You do need to know about the
> overall format of the copy data stream being different, and defining
> PQbinaryTuples as still returning 1 means there's no clean way to
> understand overall copy format vs. per-field format.
>

> There's a case to be made that we should invent a new function named
> along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
> some more. That function is currently deprecated and I'm not very happy
> with un-deprecating it only to use it in a confusing way.
>

I see a introduction of PQcopyFormat() as best idea. So for
PQbinaryTuples() and PQfformat() these new changes are transparent - and
PQcopyFormat can returns info about used method.

> To be more concrete about this: I think it's actually rather broken
> that this patch ties RAW to binary format of the field contents.
> Why would it not be exactly as useful to have delimiter-less COPY
> of textual data, for use when there's just one datum and/or you're
> confident in picking the data apart for yourself? But as things stand
> it'd be too confusing for an application to try to figure out what's
> happening in such a case.
>
> So I think we should either invent RAW_TEXT and RAW_BINARY formats
> (not just RAW) or make RAW be an orthogonal copy option. And we need
> to improve libpq's behavior enough so that applications can sanely
> figure out what's happening.
>

I had a use case that required binary mode. Higher granularity has sense.

This opening new question - RAW_TEXT will use text output function. But if
I will pass this value as text value, then a behave of current clients will
be same as usual COPY. So I need to use binary protocol. And then the
behave of PQbinaryTuples() and PQfformat() is the question? Although text
value can be passed in binary mode too (with format [length, data...]).

>
> > I executed all tests in libpq and ecpg without any problems. Can you,
> > please, help me with repeating a ecpg issues?
>
> Of course the ecpg tests pass; you didn't extend them to see what would
> happen if someone tries COPY RAW with ecpg. Likewise, we have no tests
> exercising a client's use of libpq with more intelligence than psql has
> got. But that doesn't mean it's acceptable to write this patch with no
> thought for such clients.
>

if we don't change PQbinaryTuples() and PQfformat(), then COPY RAW should
be transparent for any client. Server sending data in binary format - what
is generic.

>
> I am fairly sure that there actually are third-party client libraries
> that have more intelligence about COPY than psql, but I do not remember
> any specifics unfortunately.
>

The COPY RAW should not to break any existing application. This is new
feature - and old application, old client use COPY RAW newer. I see as
important the conformity of used mode (text/binary) and PQbinaryTuples()
and PQfformat().

I am writing few lines as summary:

1. invention RAW_TEXT and RAW_BINARY
2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
the client should to check PQcopyFormat() to not print "\n" on the end
3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
used output function, not necessary client modification
4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
RAW_BINARY
5. create tests for ecpg

Is it ok?

What do you prefer 3.a, or 3.b?

Regards

Pavel

>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2016-03-29 17:29:47 Re: WIP: Access method extendability
Previous Message Robert Haas 2016-03-29 17:26:28 Re: Reworks of CustomScan serialization/deserialization