Quick Links

Re: UTF8 with BOM support in psql

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: UTF8 with BOM support in psql
Date:	2009-11-14 10:46:47
Message-ID:	1258195607.14314.20.camel@vanquo.pezone.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
> Client encoding is declared in body of a file, but BOM is
> in head of the file. So, we should always ignore BOM sequence
> at the file head no matter what client encoding is used.
>
> The attached patch replace BOM with while spaces, but it does not
> change client encoding automatically. I think we can always ignore
> client encoding at the replacement because SQL command cannot start
> with BOM sequence. If we don't ignore the sequence, execution of
> the script must fail with syntax error.

I don't know what the best solution is here. The BOM encoded as UTF-8
is valid data in other encodings. Of course, there is your point that
such data cannot be at the start of an SQL command.

There is also the notion of how files are handled on Unix. Normally,
you'd assume that all of

psql -f file.sql
psql < file.sql
cat file.sql | psql
cat file1.sql file2.sql | psql

behave consistently. That would require that the BOM is ignored in the
middle of the data stream (which is legal and required per Unicode
standard) and that this only happens if the character set is actually
Unicode.

Any ideas?

In response to

Re: UTF8 with BOM support in psql at 2009-10-21 04:11:59 from Itagaki Takahiro

Responses

Re: UTF8 with BOM support in psql at 2009-11-14 13:06:01 from Andrew Dunstan
Re: UTF8 with BOM support in psql at 2009-11-17 08:59:25 from Chuck McDevitt

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2009-11-14 12:35:09	Re: Patch committers
Previous Message	Magnus Hagander	2009-11-14 09:11:10	Re: Patch committers