Re: raw output from copy

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "hlinnaka" <hlinnaka(at)iki(dot)fi>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Pavel Golub" <pavel(at)microolap(dot)com>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>
Subject: Re: raw output from copy
Date: 2016-04-04 16:55:46
Message-ID: 12799.1459788946@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Daniel Verite" <daniel(at)manitou-mail(dot)org> writes:
> One reason of adding the format to COPY is that it's where users
> are looking for it. It's the canonical way of importing contents
> from files so that's where it makes more sense.

I'm not sure I buy that argument, because it could be used to justify
adding absolutely any ETL functionality to COPY. And we don't want
to go down that path; the design intention for COPY is that it be as
simple and fast as possible.

>> And I am still waiting for a non-psql use case. But I don't expect to
>> see one, precisely because most clients have no difficulty at all in
>> handling binary data.

> You mean small or medium-size binary data. The 512MB-1GB range is
> impossible to handle if requested in text format, which is what drivers
> tend to use. Even pg_dump fails on these contents.

... which is COPY. I do not see that RAW mode is going to help much
here: it's not going to be noticeably better than COPY BINARY in terms
of maximum field width.

>> Code that uses PQexecParams() binary "resultFormat", or the
>> binary format of copy doesn't have that problem, but most
>> client-side drivers don't do that.

> And maybe they just can't realistically, because getting result
> format in binary is exposed as an all-or-nothing choice in libpq.

That's simply wrong. Read the documentation for PQexecParams and
friends: you can specify text or binary per-column. It's COPY that
has the only-one-column-format restriction, and RAW certainly isn't
going to make that better.

I'm not quite as convinced as Andrew that RAW mode is unnecessary,
but I don't find these arguments for it to be very compelling.

The real issue to my mind is that it doesn't seem like we can shoehorn
a sanely-defined version of RAW into the existing protocol spec without
creating compatibility hazards. So we can either wait for the mythical
protocol v4 (but even a protocol update wouldn't fix the application-level
hazards) or we can treat it as a problem to be solved client-side.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-04-04 17:28:39 Re: raw output from copy
Previous Message Robert Haas 2016-04-04 16:45:25 Re: [BUGS] Breakage with VACUUM ANALYSE + partitions