From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: proposal: UTF8 to_ascii function |
Date: | 2008-08-11 13:42:36 |
Message-ID: | 48A041CC.1090703@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jan Urbański wrote:
> Andrew Dunstan wrote:
>>
>>
>> Pavel Stehule wrote:
>>>
>>>
>>> One note - convert_to is correct. But we have to use to_ascii without
>>> decode functions. It has same behave - convert from bytea to text.
>>> Text in "incorrect" encoding is dafacto bytea. So correct to_ascii
>>> function prototypes are:
>>>
>>> to_ascii(text)
>>> to_ascii(bytea, integer);
>>> to_ascii(bytea, name);
>>>
>>>
>>>>
>>
>> What you have not said is how you propose to convert UTF8 to ASCII.
>>
>> Currently to_ascii() converts a small number of single byte charsets
>> to ASCII by folding the chars with high bits set, so what we get is a
>> pure ASCII result which is safe in any server encoding, as they are
>> all ASCII supersets.
>>
>> But what conversion rule will you use for the gazillions of Unicode
>> characters?
>>
>> I honestly do not understand the use case for this at all.
>
> I do. Often clients want their searches to be
> accented-or-language-specific letters insensitive. So searching for
> 'łódź' returns 'lodz'. So the use case is there (in fact, the lack of
> such facility made me consider not upgrading particular client to
> 8.3...).
> Or maybe there's a better way to do it?
Well, my first question would be "Why aren't you using a database
encoding that supports to_ascii()?"
However, I suppose that your use case would support this signature:
to_ascii(bytea, name)
where it would just error out if the encoding name were something other
than LATIN1, LATIN2, LATIN9, or WIN1250.
But what would be the meaning of this?:
to_ascii(bytea, integer)
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Zdenek Kotala | 2008-08-11 13:48:41 | Re: Proposal: PageLayout footprint |
Previous Message | Jan Urbański | 2008-08-11 13:23:29 | Re: proposal: UTF8 to_ascii function |