Re: Enforcing database encoding and locale match

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enforcing database encoding and locale match
Date: 2007-09-28 23:45:37
Message-ID: 21612.1191023137@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> The another question is what do when we know that this codeset/encoding
> is not supported by postgres.

Ah, I finally grasped what you were on about here. As CVS HEAD stands,
if you run initdb in an unrecognized locale, you get something like

$ LANG=zh_CN.GB18030 initdb
The files belonging to this database system will be owned by user "tgl".
This user must also own the server process.

The database cluster will be initialized with locale zh_CN.GB18030.
could not determine encoding for locale "zh_CN.GB18030": codeset is "GB18030"
initdb: could not find suitable encoding for locale "zh_CN.GB18030"
Rerun initdb with the -E option.
Try "initdb --help" for more information.
$

which is OK, but if you override it incorrectly, it'll let you do so:

$ LANG=zh_CN.GB18030 initdb -E utf8
The files belonging to this database system will be owned by user "tgl".
This user must also own the server process.

The database cluster will be initialized with locale zh_CN.GB18030.
could not determine encoding for locale "zh_CN.GB18030": codeset is "GB18030"
... but it presses merrily along ...

leading to a database which is in fact broken.

To prevent this, I think it would be sufficient to add entries to the
table for our known frontend-only encodings. It's reasonable to assume
that anyone who wants to run Postgres will probably have a default
locale that uses *some* encoding that we support; otherwise he's going
to have a pretty unpleasant experience anyway. If the function returns
a frontend-only encoding value then initdb will fail in a good way,
since it won't let the user select that as a database encoding. So I
don't think we need an explicit concept of an unsupported encoding in
the table, just some more entries.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-09-28 23:59:57 Re: Turn off vacuum in pgbench?
Previous Message Tom Lane 2007-09-28 22:58:53 Re: Enforcing database encoding and locale match