Re: Support UTF-8 files with BOM in COPY FROM

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, david(at)kineticode(dot)com, itagaki(dot)takahiro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support UTF-8 files with BOM in COPY FROM
Date: 2011-09-26 18:49:16
Message-ID: 1317062957.29925.11.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On mån, 2011-09-26 at 14:44 -0400, Robert Haas wrote:
> > We did recently accept a patch for psql -f to skip over a UTF-8
> > byte-order mark. We had a lot of this same discussion there.
>
> But that case is different, because zero-width, non-breaking space has
> no particular meaning in an SQL script - it's either going to be
> ignored as a BOM, ignored as whitespace, or an error. But inside a
> file being subjected to COPY it might be confusable with data that the
> user wanted to end up in some table.

Yes, my point was more directed toward the discussion about whether BOM
in UTF-8 are valid at all. But your point pretty much kills this
altogether. If I store a BOM in row 1, column 1 of my table, because,
well, maybe it's an XML document or something, then it needs to be able
to survive a copy out and in. The only way we could proceed with this
would be if we prohibited BOMs in all user-data.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brar Piening 2011-09-26 18:57:25 Re: Support UTF-8 files with BOM in COPY FROM
Previous Message Andrew Dunstan 2011-09-26 18:47:06 Re: Support UTF-8 files with BOM in COPY FROM