| From: | Devrim GUNDUZ <devrim(at)gunduz(dot)org> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Locale-based identifier conversion and Turkish | 
| Date: | 2003-12-15 18:39:05 | 
| Message-ID: | 1071513545.18217.56.camel@devrim | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
A year ago Nicolai Tufar <ntufar(at)TDMSoft(dot)com> submitted a patch to
change lower-case conversion of identifiers from locale-dependent to
ASCII in this thread:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg01159.php
Tom Lane argued that SQL99 standard states that identifier case 
convervisons are to be done on the basis of Unicode upper/lower case
equivalencies and that locale-based conversion is closer than
ASCII-only. And the patch was rejected.
Now, PostgreSQL 7.4 initdb fails if run with locale set to tr_TR:
----------------------------------------------------------------------------
[pgsql74(at)devrim backend]$ initdb -D /usr/local/pgsql/data --locale=tr_TR
The files belonging to this database system will be owned by user
"pgsql74".
This user must also own the server process.
 
The database cluster will be initialized with locale tr_TR.
 
fixing permissions on existing directory /usr/local/pgsql/data... ok
creating directory /usr/local/pgsql/data/base... ok
creating directory /usr/local/pgsql/data/global... ok
creating directory /usr/local/pgsql/data/pg_xlog... ok
creating directory /usr/local/pgsql/data/pg_clog... ok
selecting default max_connections... 100
selecting default shared_buffers... 1000
creating configuration files... ok
creating template1 database in /usr/local/pgsql/data/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... NOTICE:  type "voıd" is not yet defined
DETAIL:  Creating a shell type definition.
ERROR:  type cstrıng does not exist
 
initdb: failed
[pgsql74(at)devrim backend]$
-----------------------------------------------------------------
Failure is caused by the following statement:
    CREATE OR REPLACE FUNCTION ascii_to_mic (INTEGER, INTEGER,
	 CSTRING,CSTRING, INTEGER) RETURNS VOID AS
	 '$libdir/ascii_and_mic', 'ascii_to_mic' 
	 LANGUAGE 'c' STRICT;
from file share/conversion_create.sql
As you can see "I" in "VOID" gets converted to i-dotless in conformance
to tr_TR Locale conversion rules, which is not an expected behaviour for
Turkish users who set their locale to tr_TR.
Attached is a two-line patch that changes identifier name conversion in
backend/parser/scan.l from tolower() to a simple ASCII based one. It
will solve database creation problem but apparently will break
upper-lower case conversion of identifiers in national languages, like
Russian or Korean.
So what shall be done? Would you like us to prepare a patch that will
change identifer case conversion behaviour only when locale is set to
tr_TR?
Regards,
--
Devrim GUNDUZ
devrim(at)gunduz(dot)org                  devrim(dot)gunduz(at)linux(dot)org(dot)tr
                http://www.TDMSoft.com
                http://www.gunduz.org
| Attachment | Content-Type | Size | 
|---|---|---|
| scan.l.diff | text/x-patch | 651 bytes | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2003-12-15 19:44:09 | Re: [HACKERS] [PATCHES] fork/exec patch | 
| Previous Message | Andrew Dunstan | 2003-12-15 18:23:14 | Re: [HACKERS] [PATCHES] fork/exec patch |