Re: Support UTF-8 files with BOM in COPY FROM

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: david(at)kineticode(dot)com, itagaki(dot)takahiro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support UTF-8 files with BOM in COPY FROM
Date: 2011-09-26 15:09:09
Message-ID: 20110927.000909.594224957113812106.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> "David E. Wheeler" <david(at)kineticode(dot)com> <CAJW2+qdYg1+xLaHDqnJs3AcKmCSVCDkv_LCAPWUtwmxL9dzVhQ(at)mail(dot)gmail(dot)com> writes:
>> On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:
>>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.
>
>> I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."
>
> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
> that Microsloth does it does not make it standards-conformant.
>
> I think that accepting it on input can be sensible, on the principle of
> "be liberal in what you accept", but the other side of that is "be
> conservative in what you send". No BOMs in output, please.

Suppose a user uses brain-dead editor, which does not accept UTF-8
without BOM. He decides to save his editor data into PostgreSQL using
COPY FROM. He extracts the data using COPY TO. Now he finds that his
stupid editor does not accept his data any more.

So I think if we decide to accept UTF-8 with BOM, we should keep BOM
when importing the data and output the data with BOM. If we don't want
to output UTF-8 with BOM, we should not accept UTF-8 with BOM. It
seems we don't have much choice...
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Leonardo Francalanci 2011-09-26 15:23:39 Re: Is there any plan to add unsigned integer types?
Previous Message Tom Lane 2011-09-26 15:03:44 Re: contrib/sepgsql regression tests are a no-go