Re: proposal: UTF8 to_ascii function

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: UTF8 to_ascii function
Date: 2008-08-11 13:42:36
Message-ID: 48A041CC.1090703@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jan Urbański wrote:
> Andrew Dunstan wrote:
>>
>>
>> Pavel Stehule wrote:
>>>
>>>
>>> One note - convert_to is correct. But we have to use to_ascii without
>>> decode functions. It has same behave - convert from bytea to text.
>>> Text in "incorrect" encoding is dafacto bytea. So correct to_ascii
>>> function prototypes are:
>>>
>>> to_ascii(text)
>>> to_ascii(bytea, integer);
>>> to_ascii(bytea, name);
>>>
>>>
>>>>
>>
>> What you have not said is how you propose to convert UTF8 to ASCII.
>>
>> Currently to_ascii() converts a small number of single byte charsets
>> to ASCII by folding the chars with high bits set, so what we get is a
>> pure ASCII result which is safe in any server encoding, as they are
>> all ASCII supersets.
>>
>> But what conversion rule will you use for the gazillions of Unicode
>> characters?
>>
>> I honestly do not understand the use case for this at all.
>
> I do. Often clients want their searches to be
> accented-or-language-specific letters insensitive. So searching for
> 'łódź' returns 'lodz'. So the use case is there (in fact, the lack of
> such facility made me consider not upgrading particular client to
> 8.3...).
> Or maybe there's a better way to do it?

Well, my first question would be "Why aren't you using a database
encoding that supports to_ascii()?"

However, I suppose that your use case would support this signature:

to_ascii(bytea, name)

where it would just error out if the encoding name were something other
than LATIN1, LATIN2, LATIN9, or WIN1250.

But what would be the meaning of this?:

to_ascii(bytea, integer)

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zdenek Kotala 2008-08-11 13:48:41 Re: Proposal: PageLayout footprint
Previous Message Jan Urbański 2008-08-11 13:23:29 Re: proposal: UTF8 to_ascii function