Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Frans <frans(at)geodan(dot)nl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Date: 2009-04-06 17:18:48
Message-ID: 8241.1239038328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Frans <frans(at)geodan(dot)nl> writes:
> Tom Lane wrote:
>> The
>> fuzzystrmatch module doesn't really work with utf8 (nor any other
>> multibyte encoding), because it depends on the <ctype.h> functions.
>> What you'll probably get when applying it to non-ascii utf8 is
>> an invalidly encoded string.
>>
> Well, in 8.2.6 the result for non-ASCII UTF-8 was an empty string (ASCII
> code 0).

A comparison of the 8.2 and 8.3 fuzzystrmatch sources shows no
difference. The behavior of the ascii() function has indeed changed,
but soundex() is no more nor less broken than it was before.

[ thinks for a bit... ] If you are seeing a difference in what soundex
itself does, the most likely explanation is a difference in the behavior
of isalpha() or perhaps toupper(). Are you running on the same
underlying C library as before? Are you quite sure you have the same
encoding and locale selected?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Grzegorz Junka 2009-04-06 17:34:42 BUG #4751: Incorrect pg_dump output when dropping not null in inherited table.
Previous Message Frans 2009-04-06 16:10:44 Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters