Re: COPY Transform support

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: COPY Transform support
Date: 2008-04-03 15:46:05
Message-ID: 200804031746.07936.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le jeudi 03 avril 2008, Tom Lane a écrit :
> The major concern I have about this is to ensure that no detectable
> overhead is added to COPY when the feature isn't being used.

Well, when COLUMN x CONVERT USING or whatever syntax we choose is not used, we
default to current code path, that is we do not mess with data content at all
before to consider it's valid input syntax for target table datatypes.

And the syntax check is done only once, before beginning to read the data
lines from the file.

> I am not actually convinced that the column-by-column design you seem to
> have in mind is worth anything. The examples that I remember seeing
> often involve removing columns, generating one column from multiple ones
> or vice versa, dealing with nonstandard column delimiters, etc.

Yes, this is another need, but actually better solved, in my opinion, with
loading data into a (temp) loadtable then process it with SQL:
INSERT INTO destination_table SELECT whatever FROM loadtable;

The problem I'm trying to solve is not this one, I'm trying to have COPY able
to load data into a table when the representation of it we have into the file
does not match what datatype input function expects.

An example might help us talking about the same thing. mysqldump CSV outputs
timestamp sometimes (depending on server version) as '20041002152952' when
PostgreSQL expects '2004-10-02 15:29:52'. I'd like COPY to be able to cope
with this situation.

Now, another syntax proposal could have both the needs solved. We basically
need to be able to transform input fields and process them into input
columns, in a way that N input fields (found in the data file) will get us M
input columns:

COPY destination_table(col1, col2, col3, col4)
USING (field1, field2 || field3, myfunc(field4, field5))
FROM 'file.txt'
WITH ...

This could get better than preprocessing then COPY then INSERT INTO ... SELECT
because we don't need a temp table (don't need to care about its name being
unique, nor to mess up with temp_buffers), etc.
You're the one able to tell why it'll be better to have one COPY command
instead of a two table steps load, I'm just guessing ;)

And if it's better for the user to preprocess in perl then COPY, he still has
the option.
--
dim

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-04-03 15:47:12 Re: modules
Previous Message Tom Dunstan 2008-04-03 15:33:05 Re: modules