Skip site navigation (1) Skip section navigation (2)

Re: multiline CSV fields

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kris Jurka <books(at)ejurka(dot)com>,Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>,PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiline CSV fields
Date: 2004-11-30 04:34:13
Message-ID: 41ABF845.2060200@dunslane.net (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches

Tom Lane wrote:

>Kris Jurka <books(at)ejurka(dot)com> writes:
>  
>
>>Endlessly extending the COPY command doesn't seem like a winning 
>>proposition to me and I think if we aren't comfortable telling every user 
>>to write a script to pre/post-process the data we should instead provide a 
>>bulk loader/unloader that transforms things to our limited COPY 
>>functionality.  There are all kinds of feature requests I've seen 
>>along these lines that would make COPY a million option mess if we try to 
>>support all of it directly.
>>    
>>
>
>I agree completely --- personally I'd not have put CSV into the backend
>either.
>
>IIRC we already have a TODO item for a separate bulk loader, but no
>one's stepped up to the plate yet :-(
>

IIRC, the way it happened was that a proposal was made to do CSV import/export in a fairly radical way, I countered with a much more modest approach, which was generally accepted and which Bruce and I then implemented, not without some angst (as well as a little sturm und drang).


The advantage of having it in COPY is that it can be done serverside 
direct from the file system. For massive bulk loads that might be a 
plus, although I don't know what the protocol+socket overhead is. Maybe 
it would just be lost in the noise. Certainly I can see some sense in 
having COPY deal with straightforward cases and a bulk-load-unload 
program in bin to handle the hairier cases. Multiline fields would come 
into that category. The bulk-load-unload facility could possibly handle 
things other than CSV format too (XML anyone?). The nice thing about an 
external program is that it would not have to handle data embedded in an 
SQL stream, so the dangers from shifts in newline style, missing quotes, 
and the like would be far lower.

We do need to keep things in perspective a bit. The small wrinkle that 
has spawned this whole thread will not affect most users of the facility 
- and many many users will thanks us for having provided it.

cheers

andrew

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2004-11-30 04:35:03
Subject: Re: 8.0beta5 results w/ dbt2
Previous:From: Christopher BrowneDate: 2004-11-30 04:07:18
Subject: Re: Auto Vacuum

pgsql-patches by date

Next:From: Bruce MomjianDate: 2004-11-30 05:06:10
Subject: Re: Give the TODO list a little more verbose explanation
Previous:From: Bruce MomjianDate: 2004-11-30 03:53:35
Subject: Re: multiline CSV fields

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group