Re: Support UTF-8 files with BOM in COPY FROM

From: Brar Piening <brar(at)gmx(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support UTF-8 files with BOM in COPY FROM
Date: 2011-09-26 18:57:25
Message-ID: 4E80CB15.10706@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
> that Microsloth does it does not make it standards-conformant.

Could you share a pointer to the spec?
All I've ever heard is that a BOM is optional for UTF-8 but not forbidden.

The Unicode FAQ (http://unicode.org/faq/utf_bom.html#BOM) states "that
some recipients of UTF-8 encoded data do not expect a BOM".
Postgres obviously belongs to those recipients.
That's why all my psql-scripts transferring data from MSSQL to Postgres
need a '\! perl -CD -pi.orig -e "tr/\x{feff}//d" "C:/datafile.txt"'
before feeding data into COPY TO.

Reading it tolerantly and writing it on user request is probably the way
that would help most users.

Regards,

Brar

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-09-26 19:06:41 Re: bug of recovery?
Previous Message Peter Eisentraut 2011-09-26 18:49:16 Re: Support UTF-8 files with BOM in COPY FROM