Re: COPY formatting

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COPY formatting
Date: 2004-03-17 19:03:10
Message-ID: 60u10nz47l.fsf@dev6.int.libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

andrew(at)dunslane(dot)net (Andrew Dunstan) writes:
> Karel Zak wrote:
>
>> Hi,
>>
>> in TODO is item: "* Allow dump/load of CSV format". I don't think
>> it's clean idea. Why CSV and why not something other? :-)
>>
>> A why not allow to users full control of the format by they own
>> function. It means something like:
>> COPY tablename [ ( column [, ...] ) ]
>> TO { 'filename' | STDOUT }
>> [ [ WITH ] [ BINARY ]
>> [ OIDS ]
>> [ DELIMITER [ AS ] 'delimiter' ]
>> [ NULL [ AS ] 'null string' ]
>> [ FORMAT funcname ] ]
>> ^^^^^^^^^^^^^^^^
>> The formatting
>> function API can be pretty simple:
>>
>> text *my_copy_format(text *attrdata, int direction, int
>> nattrs, int attr, oid attrtype, oid relation)
>>
>> -- it's pseudocode of course, it should be use standard fmgr
>> interface.
>> It's probably interesting for non-binary COPY version.
>
> Interesting ... The alternative might be an external program to munge
> CSVs and whatever other format people want to support and then call
> the exisiting COPY- either in bin or contrib. I have seen lots of
> people wanting to import CSVs, and that's even before we get a Windows
> port.

I know Jan Wieck has been working on something like this, with a bit
of further smarts...

- By having, alongside, a table definition, the table can be created
concurrently;

- A set of mapping functions can be used, so that if, for instance,
the program generating the data was Excel, and you have a field with
values like 37985, 38045, or 38061, they can respectively be mapped
to '2004-01-01', '2004-03-01', and '2004-03-17';

- It can load whatever data is loadable, and use Ethernet-like
backoffs when it encounters bad records so that it loads all the data
that is good, and leaves a bundle of 'crud' that is left over.

He had been prototyping it in Tcl; I'm not sure how far a port to C
has gotten. It looked pretty neat; it sure seems better to put the
"cleverness" in userspace than to try to increase the complexity of
the postmaster...
--
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/linuxxian.html
Have you heard of the new Macsyma processor? It has three instructions --
LOAD, STORE, and SKIP IF INTEGRABLE.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2004-03-17 19:19:36 Re: rapid degradation after postmaster restart
Previous Message Jonathan Gardner 2004-03-17 18:45:57 Re: Doxygen?