Re: Support UTF-8 files with BOM in COPY FROM

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, david(at)kineticode(dot)com, itagaki(dot)takahiro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support UTF-8 files with BOM in COPY FROM
Date: 2011-09-26 16:07:26
Message-ID: CA+Tgmoa7SzcuViKfdbmWWeRmzZnjo93AmbhiOHaO9E=330PFow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 26, 2011 at 11:09 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>> "David E. Wheeler" <david(at)kineticode(dot)com> <CAJW2+qdYg1+xLaHDqnJs3AcKmCSVCDkv_LCAPWUtwmxL9dzVhQ(at)mail(dot)gmail(dot)com> writes:
>>> On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:
>>>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>>>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.
>>
>>> I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."
>>
>> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
>> that Microsloth does it does not make it standards-conformant.
>>
>> I think that accepting it on input can be sensible, on the principle of
>> "be liberal in what you accept", but the other side of that is "be
>> conservative in what you send".  No BOMs in output, please.
>
> Suppose a user uses brain-dead editor, which does not accept UTF-8
> without BOM.  He decides to save his editor data into PostgreSQL using
> COPY FROM. He extracts the data using COPY TO. Now he finds that his
> stupid editor does not accept his data any more.
>
> So I think if we decide to accept UTF-8 with BOM, we should keep BOM
> when importing the data and output the data with BOM. If we don't want
> to output UTF-8 with BOM, we should not accept UTF-8 with BOM. It
> seems we don't have much choice...

Maybe this needs to be an optional behavior, controlled by some COPY option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-09-26 16:12:38 Re: contrib/sepgsql regression tests are a no-go
Previous Message Kevin Grittner 2011-09-26 16:04:13 Re: random isolation test failures