Re: raw output from copy

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Golub <pavel(at)microolap(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: raw output from copy
Date: 2016-04-02 17:52:52
Message-ID: 570006F4.9090309@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/01/2016 11:42 AM, Daniel Verite wrote:
> Andrew Dunstan wrote:
>
>> If someone can make a good case that this is going to be of
>> general use I'll happily go along, but I haven't seen one so far.
> About COPY FROM with a raw format, for instance just yesterday
> there was this user question on stackoverflow:
> http://stackoverflow.com/questions/36317237
>
> which essentially is: how to import contents from a file without any
> particular interpretation of any character?\

There is so much wrong with this it's hard to know where to start.

Inserting the whole contents of a text file unchanged is insanely easy
in psql.

\set file `cat /path/to/file`
insert into mytable(contents) values(:'file');

What is more everyone on SO missed the fact that CSV mode gives you very
considerable control over the quote, delimiter and null settings.

See for example
<http://adpgtech.blogspot.com/2014/09/importing-json-data.html> which
has this example for handling files consisting of 1 json document per line:

copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';

psql's \copy will work just the same way

(I noticed with amusement this week that CitusData is using pretty much
exactly this in one of their examples.)

>
> With the patch discussed in this thread, a user can do
> \copy table(textcol) from /path/to/file (format raw)
> or the equivalent COPY.
> If it's a binary column, that works just the same.

It would be fairly simple to invent a binary mechanism that did the
equivalent of the above insert. All without any change to SQL or the
backend at all.

>
> Without this, it's not obvious at all how this result can be
> achieved without resorting to external preprocessing,
> and assuming the availability of such preprocessing tools
> in the environment. Notwithstanding the fact that the
> solution proposed on SO (doubling backslashes with sed)
> doesn't even work if the file contains tabs, as they would be
> interpreted as field separators, even if the copy target has only
> one column. You can change the delimiter with COPY but AFAIK
> you can't tell that there is none.

There is arguably a good case for allowing a null delimiter. But that SO
page is just a terrible piece of misinformation, as far too often
happens in my experience.

And I am still waiting for a non-psql use case. But I don't expect to
see one, precisely because most clients have no difficulty at all in
handling binary data.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-04-02 18:57:21 Re: Batch update of indexes
Previous Message Tom Lane 2016-04-02 17:20:59 Re: Transactional enum additions - was Re: Alter or rename enum value