Skip site navigation (1) Skip section navigation (2)

Re: COPY Transform support

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: COPY Transform support
Date: 2008-04-03 15:46:05
Message-ID: 200804031746.07936.dfontaine@hi-media.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Le jeudi 03 avril 2008, Tom Lane a écrit :
> The major concern I have about this is to ensure that no detectable
> overhead is added to COPY when the feature isn't being used.

Well, when COLUMN x CONVERT USING or whatever syntax we choose is not used, we 
default to current code path, that is we do not mess with data content at all 
before to consider it's valid input syntax for target table datatypes.

And the syntax check is done only once, before beginning to read the data 
lines from the file.

> I am not actually convinced that the column-by-column design you seem to
> have in mind is worth anything.  The examples that I remember seeing
> often involve removing columns, generating one column from multiple ones
> or vice versa, dealing with nonstandard column delimiters, etc.

Yes, this is another need, but actually better solved, in my opinion, with 
loading data into a (temp) loadtable then process it with SQL:
  INSERT INTO destination_table SELECT whatever FROM loadtable;

The problem I'm trying to solve is not this one, I'm trying to have COPY able 
to load data into a table when the representation of it we have into the file 
does not match what datatype input function expects.

An example might help us talking about the same thing. mysqldump CSV outputs 
timestamp sometimes (depending on server version) as '20041002152952' when 
PostgreSQL expects '2004-10-02 15:29:52'. I'd like COPY to be able to cope 
with this situation.

Now, another syntax proposal could have both the needs solved. We basically 
need to be able to transform input fields and process them into input 
columns, in a way that N input fields (found in the data file) will get us M 
input columns:

  COPY destination_table(col1, col2, col3, col4) 
 USING (field1, field2 || field3, myfunc(field4, field5))
  FROM 'file.txt'
  WITH ...

This could get better than preprocessing then COPY then INSERT INTO ... SELECT 
because we don't need a temp table (don't need to care about its name being 
unique, nor to mess up with temp_buffers), etc. 
You're the one able to tell why it'll be better to have one COPY command 
instead of a two table steps load, I'm just guessing ;)

And if it's better for the user to preprocess in perl then COPY, he still has 
the option.
-- 
dim

In response to

pgsql-hackers by date

Next:From: Joshua D. DrakeDate: 2008-04-03 15:47:12
Subject: Re: modules
Previous:From: Tom DunstanDate: 2008-04-03 15:33:05
Subject: Re: modules

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group