Re: Remaining dependency on setlocale()

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2026-01-06 19:54:15
Message-ID: 108e07a2-0632-4f00-984d-fe0e0d0ec726@eisentraut.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.12.25 21:09, Jeff Davis wrote:
> On Wed, 2025-12-17 at 11:39 +0100, Peter Eisentraut wrote:
>> For Metaphone, I found the reference implementation linked from its
>> Wikipedia page, and it looks like our implementation is pretty
>> closely
>> aligned to that.  That reference implementation also contains the
>> C-with-cedilla case explicitly.  The correct fix here would probably
>> be
>> to change the implementation to work on wide characters.  But I think
>> for the moment you could try a shortcut like, use pg_ascii_toupper(),
>> but if the encoding is LATIN1 (or LATIN9 or whichever other encodings
>> also contain C-with-cedilla at that code point), then explicitly
>> uppercase that one as well.  This would preserve the existing
>> behavior.
>
> Done, attached new patches.
>
> Interestingly, WIN1256 encodes only the SMALL LETTER C WITH CEDILLA. I
> think, for the purposes here, we can still consider it to "uppercase"
> to \xc7, so that it can still be treated as the same sound. Technically
> I think that would be an improvement over the current code in this edge
> case, and suggests that case folding would be a better approach than
> uppercasing.

On further reflection, it seems just as easy to have dmetaphone() take
the input collation and use that to do a proper collation-aware
upper-casing. This has the same effect (that is, it will still only
support certain single-byte encodings), but it avoids elaborately
hard-coding a bunch of things, and if we ever want to make this
multibyte-aware, then we'll have to go this way anyway, I think. See
attached patch.

Attachment Content-Type Size
0001-Make-dmetaphone-collation-aware.patch.nocfbot text/plain 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2026-01-06 19:54:29 Re: NLS: use gettext() to translate system error messages
Previous Message Robert Haas 2026-01-06 19:50:46 Re: pg_plan_advice