From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: UTF8 with BOM support in psql |
Date: | 2009-10-20 09:54:41 |
Message-ID: | 1256032481.9382.19.camel@fsopti579.F-Secure.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2009-10-20 at 14:41 +0900, Itagaki Takahiro wrote:
> UTF8 encoding text files with BOM (Byte Order Mark) are commonly
> used in Windows, though BOM was designed for UTF16 text originally.
> However, psql cannot read such format even if we set client encoding
> to UTF8. Is it worth supporting those format in psql?
psql doesn't have a problem, but the backend's lexer doesn't parse the
BOM as whitespace. Since the lexer is byte-based, it will presumably
have problems with anything outside of ASCII that Unicode considers
whitespace.
> When psql opens a file with -f or \i, it checks first 3 bytes of the
> file. If they are BOM, discard the 3 bytes and change client encoding
> to UTF8 automatically.
While I see that the Unicode standard supports using a UTF-8 encoded BOM
as UTF-8 signature, I wonder if those bytes can usefully appear in a
leading position in other encodings.
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira de Oliveira | 2009-10-20 10:58:40 | Re: ProcessUtility_hook |
Previous Message | Itagaki Takahiro | 2009-10-20 09:09:07 | ProcessUtility_hook |