Re: raw output from copy

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Golub <pavel(at)microolap(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: raw output from copy
Date: 2016-03-28 22:26:58
Message-ID: 14848.1459204018@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-03-28 23:27:09 Re: pthread portability
Previous Message Alvaro Herrera 2016-03-28 22:14:57 Re: BRIN is missing in multicolumn indexes documentation