pgsql: Make dmetaphone collation-aware

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Make dmetaphone collation-aware
Date: 2026-01-12 07:36:34
Message-ID: E1vfCTZ-006C3J-1e@gemulon.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Make dmetaphone collation-aware

The dmetaphone() SQL function internally upper-cases the argument
string. It did this using the toupper() function. That way, it has a
dependency on the global LC_CTYPE locale setting, which we want to get
rid of.

The "double metaphone" algorithm specifically supports the "C with
cedilla" letter, so just using ASCII case conversion wouldn't work.

To fix that, use the passed-in collation and use the str_toupper()
function, which has full awareness of collations and collation
providers.

Note that this does not change the fact that this function only works
correctly with single-byte encodings. The change to str_toupper()
makes the case conversion multibyte-enabled, but the rest of the
function is still not ready.

Reviewed-by: Jeff Davis <pgsql(at)j-davis(dot)com>
Discussion: https://www.postgresql.org/message-id/108e07a2-0632-4f00-984d-fe0e0d0ec726%40eisentraut.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/e39ece0343fef7bf5a689d75bbafff9386e6e3da

Modified Files
--------------
contrib/fuzzystrmatch/dmetaphone.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2026-01-12 13:31:23 pgsql: Add const to read only TableInfo pointers in pg_dump
Previous Message Nathan Bossart 2026-01-11 19:53:42 pgsql: pg_dump: Fix memory leak in dumpSequenceData().