Re: Collation rules and multi-lingual databases

From: Dennis Gearon <gearond(at)fireserve(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Collation rules and multi-lingual databases
Date: 2003-08-22 16:19:03
Message-ID: 3F464277.2080304@fireserve.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I was thinking of INGNORING locale, since it is basically fixed for a DB
for long periods of time.

If a table/column HAD it's own locale, that could be used,
but I was more interested in a function taht would allow the explicit
declaration of the encoding(s) to look for.

BTW, what is l10n

Greg Stark wrote:

>Greg Stark <gsstark(at)MIT(dot)EDU> writes:
>
>
>
>>Dennis Gearon <gearond(at)fireserve(dot)net> writes:
>>
>>
>>
>>>I think it would be nice, and I may write it eventually, to have a function
>>>called:
>>>
>>>COLLATION_VALUE( 'string', 'encoding' )
>>>
>>>
>>Indeed that would be really nice. I wish I had that and a pony.
>>
>>Unfortunately my understanding is that the collation rules are simply too
>>complex to allow such a function in general. It's too bad because it would
>>indeed eliminate a lot of the problems in a single swoop.
>>
>>
>
>Uh, so apparently I'm on crack and this is *precisely* how the l10n collation
>rules work. Sorry for jumping in with an uninformed opinion.
>
>
>
>> Effectively, the way these functions work is by applying a mapping to
>>transform the characters in a string to a byte sequence that represents
>>the string's position in the collating sequence of the current locale.
>>Comparing two such byte sequences in a simple fashion is equivalent to
>>comparing the strings with the locale's collating sequence.
>>
>> The functions `strcoll' and `wcscoll' perform this translation
>>implicitly, in order to do one comparison. By contrast, `strxfrm' and
>>`wcsxfrm' perform the mapping explicitly. If you are making multiple
>>comparisons using the same string or set of strings, it is likely to be
>>more efficient to use `strxfrm' or `wcsxfrm' to transform all the
>>strings just once, and subsequently compare the transformed strings
>>with `strcmp' or `wcscmp'.
>>
>>
>
>Given this it should be easy to write a collation_value(string,locale) C
>function that switches the collation order, calls strxfrm and then restores
>the collation order.
>
>I fear memory leaks or performance losses on frequent locale switches like
>this but it should be easy enough to try out. I don't see any problems with
>postgres as long as it's possible to ensure the locale is always switched back
>properly. It might not be thread-safe though.
>
>At worst I could always call strxfrm in the application for each locale I care
>about when inserting the data. That would bloat my tables for nothing though.
>
>So it's looking like I might get my pony after all.
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Manfred Koizar 2003-08-22 16:20:26 Re: [HACKERS] Buglist
Previous Message Jan Wieck 2003-08-22 16:18:02 Re: [HACKERS] Buglist