Re: Locale-dependent case conversion in {identifier}

From: Nicolai Tufar <ntufar(at)apb(dot)com(dot)tr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Locale-dependent case conversion in {identifier}
Date: 2002-11-30 07:57:44
Message-ID: 3DE86F78.9000905@apb.com.tr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-general pgsql-hackers

By no means I would try to convince that your reading of
the SQL standards is wrong. What I am trying to tell is
that Turkish alphabet is broken beyond repair. And since
there is absolutely no way to change our alphabet, we
may can code a workaround in the code.

So i do not claim that your code is wrong. It is
behaviang according to specification. But unfortunately
folks at SQL99 probably were not aware of the woes
of Turkish "I".

The very special case of letter "I" in Turkish is not
only PostgreSQL's problem. Many java programs have
failed miserably trying to open files with "I"s in
pathnames.

So basically, there are two letters "I" in Trukish.
The wone is with dot on top and another is without.
The with dot on top walways has the dot and the one
without never has it. Simple. The problem is
with the standard Latin "I". So why small "i" does
have a dot and capital "I" does not?

Standard conversion is
Lower: "I" -> "y'" and "Y'" -> "i".
Upper: "y'" -> "I" and "i" -> "Y'".
(font may not be displayed correctly in your mail reader)

Historically programs that operate in Turkish locale have
chosen to hardcode the capitalisation of "i" in system
messages and identifier names like this:

Lower: "I" -> "i" and "Y'" -> "i".
Upper: "y'" -> "I" and "i" -> "I".

With this, no matter what kind of "I" you used in names,
it is always going to end up a valid ASCII character.

Would it be acceptable if I submit a path that applies this
special logic in src/backend/parser/scan.l if the locale is "tr_TR"?

Because for many folks setting locale to Turkish would
render their database unusable. For, god forbid, if your
sql has a column name written in capitlas including "I".
It is not working. So I deeply believe that PostgreSQL community
have to provide a workaround for this problem.

So what should I do?

Best regards,
Nick

Tom Lane wrote:
> "Nicolai Tufar" <ntufar(at)apb(dot)com(dot)tr> writes:
>
>>So I have changed lower-case conversion code in scan.l to make it purely
>>ASCII-based.
>>as in keywords.c. Mini-patch is given below.
>
>
> Rather than offering a patch, you need to convince us why our reading of
> the SQL standard is wrong. ("Oracle does it that way" is not an
> argument that will carry a lot of weight.)
>
> SQL99 states that identifier case conversions are done on the basis of
> the Unicode upper/lower case equivalences, so it seems clear that they
> intend more than ASCII-only conversion for identifiers. Locale-based
> conversion might not be an exact implementation of the spec, but it's
> surely closer than ASCII-only.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Neil Conway 2002-11-30 08:06:16 Re: 7.4 Wishlist
Previous Message Josh Berkus 2002-11-30 06:41:04 Re: Press Release status?

Browse pgsql-general by date

  From Date Subject
Next Message Hubert depesz Lubaczewski 2002-11-30 08:03:15 Re: strange pg_stats behaviour?
Previous Message Joel Burton 2002-11-30 07:45:44 Re: SQL Query

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2002-11-30 08:06:16 Re: 7.4 Wishlist
Previous Message Alvaro Herrera 2002-11-30 05:55:07 Re: 7.4 Wishlist