Quick Links

Re: UTF8 with BOM support in psql

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: UTF8 with BOM support in psql
Date:	2009-10-21 04:11:59
Message-ID:	20091021114142.9561.52131E4D@oss.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

David Christensen <david(at)endpoint(dot)com> wrote:

> Is that only when the default client encoding is set to UTF8
> (PGCLIENTENCODING, whatever), or will it be coded to work with the
> following:
>
> $ psql -f <file>
> Where <file> is:
> <BOM>
> SET CLIENT ENCODING 'utf8';

Sure. Client encoding is declared in body of a file, but BOM is
in head of the file. So, we should always ignore BOM sequence
at the file head no matter what client encoding is used.

The attached patch replace BOM with while spaces, but it does not
change client encoding automatically. I think we can always ignore
client encoding at the replacement because SQL command cannot start
with BOM sequence. If we don't ignore the sequence, execution of
the script must fail with syntax error.

This patch does nothing about COPY and \copy commands. It might be
possible to add BOM handling code around AllocateFile() in CopyFrom()
to support "COPY FROM 'utf8file-with-bom.tsv'", but we need another
approach for "COPY FROM STDIN". For example,
$ echo utf8bom-1.tsv utf8bom-2.tsv | psql -c "COPY FROM STDIN"
might contain BOM sequence in the middle of input stream.
Anyway, those changes would come from another patch in the future.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment	Content-Type	Size
psql-utf8bom_20091021.patch	application/octet-stream	737 bytes

In response to

Re: UTF8 with BOM support in psql at 2009-10-20 16:02:02 from David Christensen

Responses

Re: UTF8 with BOM support in psql at 2009-10-21 10:00:08 from Peter Eisentraut
Re: UTF8 with BOM support in psql at 2009-10-24 21:33:06 from Peter Eisentraut
Re: UTF8 with BOM support in psql at 2009-11-14 10:46:47 from Peter Eisentraut
Re: UTF8 with BOM support in psql at 2009-11-16 20:37:07 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Itagaki Takahiro	2009-10-21 04:24:10	Re: Going, going, GUCs!
Previous Message	Marc G. Fournier	2009-10-21 04:08:02	Re: Going, going, GUCs!