Re: UTF8 with BOM support in psql

From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UTF8 with BOM support in psql
Date: 2009-11-17 05:19:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> > If encoding setting is reverted,
> >> "Eat BOM at beginning of file and <<set client encoding to UTF-8>>"
> > will be much safer.
> This isn't going to happen, so please stop wasting our time arguing
> about it.

Ok, sorry. But I still cannot accept this restriction.
>> - Only when client encoding is UTF-8 --> please fix that

The attachd patch is a new proposal of the feature.
When we found BOM at beginning of file, set "expected_encoding" to UTF8.
Before every execusion of query, if pset.encoding is not UTF8, we check the
query string not to contain any non-ASCII characters and throw an error if
found. Encoding declarations are typically written only in ascii characters,
so we can postpone encoding checking until non-ascii characters appear.

Since the default value of expected_encoding is SQL_ASCII, that pass
through all characters, so the patch does nothing to scripts without BOM.
(There are no codes to set expected_encoding except BOM.)
If client encoding is UTF8, it skips BOM and no effect to the script body.
BOMs are skipped even if client encoding is not set to UTF8, but can throw
an error if there are no explicit encoding declaration.

AFAIC, the patch can solve the almost problems in the discussions
developmentally. Comments welcome.

ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
psql-utf8bom_20091117.patch application/octet-stream 2.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message George Gensure 2009-11-17 05:41:53 Re: patch - Report the schema along table name in a referential failure error message
Previous Message Robert Haas 2009-11-17 04:25:11 Re: sgml and "empty" closing tags