Re: Different length lines in COPY CSV

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tino Wildenhain <tino(at)wildenhain(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgman(at)candle(dot)pha(dot)pa(dot)us, chriskl(at)familyhealth(dot)com(dot)au, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Different length lines in COPY CSV
Date: 2005-12-12 20:35:45
Message-ID: 20051212203535.GD30160@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 12, 2005 at 09:30:12PM +0100, Tino Wildenhain wrote:
> Am Montag, den 12.12.2005, 15:08 -0500 schrieb Andrew Dunstan:
> > You are probably right. The biggest wrinkle will be dealing with various
> > encodings, I suspect. That at least is one thing that doing CSV within
> > the backend bought us fairly painlessly. Perl's Text::CSV_XS module for
> > example simply handles this by declaring that only [\x09\x20-\x7f] are
> > valid in its non-binary mode, and in either mode appears to be MBCS
> > unaware. We should try to do better than that.
>
> Are there any test datafiles available in a repository?
> I could give it a shot I think.
>
> If not maybe we could set up something like that.

Note, recent versions of Perl allow you to specify the file encoding
when you open the file and will convert things to UTF-8 as appropriate.
So in theory it should be fairly simple to make a script that could
handle various encodings. The hardest part is always determining which
encoding a file is in in the first place...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2005-12-12 20:40:03 Re: Please Help: PostgreSQL Query Optimizer
Previous Message Tino Wildenhain 2005-12-12 20:30:12 Re: Different length lines in COPY CSV