Sorting by respecting diacritics/accents

From: JānisE <janise(at)inbox(dot)lv>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Sorting by respecting diacritics/accents
Date: 2025-07-25 10:05:17
Message-ID: 1814742357.238678.1753437917809@w8
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello! I seem to not be able to get PostgreSQL to sort rows by a string column respecting the diacritics. I read [1] that it's possible to define a custom collation having collation strength "ks" set to "level2", which would mean that it's accent-sensitive. However, when I try to actually sort using that collation, the order seem to be accent-insensitive. For example: CREATE TABLE test (string text); INSERT INTO test VALUES ('bar'), ('bat'), ('bär'); CREATE COLLATION "und1" (provider = icu, deterministic = false, locale = 'und-u-ks-level1'); CREATE COLLATION "und2" (provider = icu, deterministic = false, locale = 'und-u-ks-level2'); CREATE COLLATION "und3" (provider = icu, deterministic = false, locale = 'und-u-ks-level3'); SELECT * FROM test ORDER BY string collate "und1"; SELECT * FROM test ORDER BY string collate "und2"; SELECT * FROM test ORDER BY string collate "und3"; All three collations give me the same order: bar < bär < bat, although an accent-sensitive order would be bar < bat < bär This does lose "bär", meaning that those strength levels do have some kind of an effect on "DISTINCT": SELECT DISTINCT string COLLATE "und1" FROM test; But it's not working on "ORDER BY". Do I misunderstand the collation capabilities? Is there a way to actually get an accent-sensitive order? Also, is there a way to see what options are there for the default built-in collations? I don't see, for example, the used "ks" level in the "pg_collation" table data. Best regards, Janis [1]  https://www.postgresql.org/docs/current/collation.html#ICU-COLLATION-COMPARISON-LEVELS

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark 2025-07-25 12:26:16 Upper / lower case keys in JSON objects
Previous Message Pierre Barre 2025-07-25 09:25:48 Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance