From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, david(at)kineticode(dot)com, itagaki(dot)takahiro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Support UTF-8 files with BOM in COPY FROM |
Date: | 2011-09-26 17:19:26 |
Message-ID: | CA+TgmoZNw=F-+fvpH8xpeiph6kiAK1Vk1Ch4ONu6d+N-UG++5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 26, 2011 at 1:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Sep 26, 2011 at 11:09 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> Suppose a user uses brain-dead editor, which does not accept UTF-8
>>> without BOM.
>
>> Maybe this needs to be an optional behavior, controlled by some COPY option.
>
> I'm not excited about emitting non-standards-conformant output on the
> strength of a hypothetical argument about users and editors that may or
> may not exist. I believe that there's a use-case for reading BOMs, but
> I have seen no field complaints demonstrating that we need to write
> them. Even if we had a couple, "use a less brain dead editor" might be
> the best response. We cannot promise to be compatible with arbitrarily
> broken software.
The thing that makes me doubt that is this comment from Tatsuo Ishii:
TI> COPY explicitly specifies the encoding (to be UTF-8 in this case). So
TI> I think we should not regard U+FEFF as "BOM" in COPY, rather we should
TI> regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".
If a BOM is confusable with valid data, then I think recognizing it
and discarding it unconditionally is no good - you could end up where
COPY OUT, TRUNCATE, COPY IN changes the table contents.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-09-26 17:28:15 | Re: Support UTF-8 files with BOM in COPY FROM |
Previous Message | Tom Lane | 2011-09-26 17:15:29 | Re: Support UTF-8 files with BOM in COPY FROM |