Re: raw output from copy

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Golub <pavel(at)microolap(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: raw output from copy
Date: 2016-03-29 16:19:13
Message-ID: 14226.1459268353@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

> Maybe I am blind, but I don't see any new security risks. The risk can be
> only on client side - and if client is not able work with new value, then
> it can fails.

Well, the point is that low-level code might get used to process the data
stream for commands it doesn't have any control over. Maybe there's no
realistic security risk there, or maybe there is; I'm not sure.

> I am thinking so PQbinaryTuples should to return 1 (without change), and
> PQfformat should to return 2.

Well, that seems pretty backwards to me. The format of the individual
fields is still what it is under COPY BINARY; you would not use a
different per-field transformation. You do need to know about the
overall format of the copy data stream being different, and defining
PQbinaryTuples as still returning 1 means there's no clean way to
understand overall copy format vs. per-field format.

There's a case to be made that we should invent a new function named
along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
some more. That function is currently deprecated and I'm not very happy
with un-deprecating it only to use it in a confusing way.

To be more concrete about this: I think it's actually rather broken
that this patch ties RAW to binary format of the field contents.
Why would it not be exactly as useful to have delimiter-less COPY
of textual data, for use when there's just one datum and/or you're
confident in picking the data apart for yourself? But as things stand
it'd be too confusing for an application to try to figure out what's
happening in such a case.

So I think we should either invent RAW_TEXT and RAW_BINARY formats
(not just RAW) or make RAW be an orthogonal copy option. And we need
to improve libpq's behavior enough so that applications can sanely
figure out what's happening.

> I executed all tests in libpq and ecpg without any problems. Can you,
> please, help me with repeating a ecpg issues?

Of course the ecpg tests pass; you didn't extend them to see what would
happen if someone tries COPY RAW with ecpg. Likewise, we have no tests
exercising a client's use of libpq with more intelligence than psql has
got. But that doesn't mean it's acceptable to write this patch with no
thought for such clients.

I am fairly sure that there actually are third-party client libraries
that have more intelligence about COPY than psql, but I do not remember
any specifics unfortunately.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-03-29 16:23:11 Re: extend pgbench expressions with functions
Previous Message Shulgin, Oleksandr 2016-03-29 16:17:17 Re: unexpected result from to_tsvector