Re: utf8 COPY DELIMITER?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Mark Dilger <pgsql(at)markdilger(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: utf8 COPY DELIMITER?
Date: 2007-04-17 18:28:18
Message-ID: 4129.1176834498@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Mark Dilger wrote:
>> I'm working on fixing bugs relating to multibyte character encodings.
>> I wasn't sure whether this was a bug or not. I don't think we should
>> use the phrasing "COPY delimiter must be a single character" when, in
>> utf8 land, I did in fact use a single character. We might say "a
>> single byte", or we might extend the functionality to handle multibyte
>> characters.

> Doing the latter would be a feature, and so is of course right off the
> table for this release. Changing the error messages to be clearer should
> be fine.

+1 on changing the message: "character" is clearly less correct than "byte"
here.

I doubt that supporting a single multibyte character would be an
interesting extension --- if we wanted to do anything at all there, we'd
just generalize the delimiter to be an arbitrary string. But it would
certainly slow down COPY by some amount, which is an area where you'll
get push-back for performance losses, so you'd need to make a convincing
use-case for it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-04-17 18:33:40 Re: utf8 COPY DELIMITER?
Previous Message Andrew Dunstan 2007-04-17 17:37:58 Re: utf8 COPY DELIMITER?