Re: Ragged CSV import

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ragged CSV import
Date: 2009-09-11 08:27:06
Message-ID: m2eiqd28f9.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I do like the idea of COPY returning a SETOF text[], but I am not at all
> clear on the mechanics of feeding STDIN to an SRF. ISTM that something like
> a RETURNING clause on COPY and the ability to use it in FROM clause or
> something similar might work better.

I envisonned COPY "just" returning all what it reads (minus extra discarded
column as soon as your proposal gets implemented), in the FROM clause,
and the user sorting out what he wants in the SELECT clause.

> I understand the difficulties, but
> maybe we could place some restrictions on where it could be used so as to
> obviate at least some of those.

Maybe instead of opening FROM for COPY, having it accepted in WITH would
be better, the same way (from the user point of view) that DML returning
are worked on.

> INSERT into foo (x,y,z) select t[3],[t2],[t57] from (COPY RETURNING
> t FROM stdin CSV);

Would become:

WITH csv AS (
COPY t FROM stdin CSV
)
INSERT INTO foo(x, y, z)
SELECT t[3], t[2], mytimestamptz([5], [6], [7])
FROM csv;

Now the text[] has a strange feeling, without it it'd be:

WITH csv AS (
COPY t(a, b, c, d, e, f, g)
FROM stdin
CSV IGNORING EXTRA COLUMNS -- random nice syntax
MISSING COLUMNS DEFAULTS NULL -- that needs some reality check
)
INSERT INTO foo(x, y, z)
SELECT c, b, mytimestamptz(e, f, g)
FROM csv;

The function mytimestamptz(date text, time text, timezone text) will
accept input that PostgreSQL input types would have errored out on... so
you can process in one go strange formats from other products.

> The only thing that's been seriously on the table that isn't accounted for
> by something like this is the suggestion of making the header line have some
> semantic significance, and I'm far from sure that's a good idea.

Yeah, and it seems only useful when you don't have any way to play with
what COPY returns before it goes to a table (magic column reordering or
ignoring).

Regards,
--
dim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-09-11 08:44:45 Re: [COMMITTERS] pgsql: On Windows, when a file is deleted and another process still has
Previous Message Magnus Hagander 2009-09-11 06:13:50 Re: logging hook for database audit