Re: Support UTF-8 files with BOM in COPY FROM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support UTF-8 files with BOM in COPY FROM
Date: 2011-09-26 14:44:38
Message-ID: 6146.1317048278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> <CAJW2+qdYg1+xLaHDqnJs3AcKmCSVCDkv_LCAPWUtwmxL9dzVhQ(at)mail(dot)gmail(dot)com> writes:
> On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:
>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.

> I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."

Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
that Microsloth does it does not make it standards-conformant.

I think that accepting it on input can be sensible, on the principle of
"be liberal in what you accept", but the other side of that is "be
conservative in what you send". No BOMs in output, please.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-09-26 15:01:06 Re: Is there any plan to add unsigned integer types?
Previous Message Tatsuo Ishii 2011-09-26 14:33:50 Re: Support UTF-8 files with BOM in COPY FROM