Re: Ragged CSV import

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ragged CSV import
Date: 2009-09-11 01:26:42
Message-ID: 4AA9A752.1070006@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost wrote:
> * Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
>
>> Consider the suggestion withdrawn.
>>
>
> Let's not throw it out completely. The proposal to have COPY return a
> text[] in some fashion is interesting enough that others (ah, such as
> myself..) might be willing to put effort into it. Andrew, could you put
> your thoughts plus some example files onto a wiki page, at least? Then
> Robert, Tom, myself, etc, could update that to nail down the
> specification and then it's just an implementation detail, as it were.
>
>
>

I don't mind discussing the idea a bit.

I don't have any samples readily to hand, but really anything that's not
strictly rectangular meets my original case, like

a,b,c
1,2,3
4,5,6,7
8,9
10,11,12

I do like the idea of COPY returning a SETOF text[], but I am not at all
clear on the mechanics of feeding STDIN to an SRF. ISTM that something
like a RETURNING clause on COPY and the ability to use it in FROM clause
or something similar might work better. I understand the difficulties,
but maybe we could place some restrictions on where it could be used so
as to obviate at least some of those.

One of the things I like about a SETOF text[] is that it lets you
reorder the columns, or cherry pick which ones you want. In fact, it
might be argued with that the hacky FORCE NOT NULL, which has always
pained me somewhat, even if it was my idea ;-) might no longer be needed.

I'd love to be able to do something like

INSERT into foo (x,y,z) select t[3],[t2],[t57] from (COPY RETURNING
t FROM stdin CSV);

The only thing that's been seriously on the table that isn't accounted
for by something like this is the suggestion of making the header line
have some semantic significance, and I'm far from sure that's a good idea.

If time were not short on getting features presented I might attempt to
do it, but I have one very large monkey (and a few small ones) on my
back that I am determined to get rid of by the November CF, and there is
not a hope in the world I could get two large features done, even if we
had the details of this all sorted out and agreed on. That's why I said
"Consider the suggestion withdrawn".

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2009-09-11 01:56:21 Re: logging hook for database audit
Previous Message Josh Berkus 2009-09-11 01:04:37 Re: community decision-making & 8.5