Problem in 'ORDER BY' of a column using a created collation?

From: Nishant Sharma <nishant(dot)sharma(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Problem in 'ORDER BY' of a column using a created collation?
Date: 2025-08-25 07:51:45
Message-ID: CADrsxdbyWg75xV6sBeVCw76UE38JTW=1GJha4GAWP7jB0AK1ow@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We encountered a strange behavior in ordering a column using a
created collation. Here are the experiments details:

Experiment 1:-
SQL File : PG_Exp_1.sql

Actual Output : PG_Exp_1.out

Created COLLATION : CREATE COLLATION test_coll (
provider = icu, locale = 'ja-u-kr-latn-digit');

'SELECT' Queries :
SELECT * FROM test_table ORDER BY value1 COLLATE
test_coll, value2 COLLATE test_coll;
SELECT * FROM test_table ORDER BY value2 COLLATE
test_coll, value1 COLLATE test_coll;

Expectation : All alphabets should come before all digits.

Seen Behavior : Column 1 in the 'ORDER BY' i.e value1 of the first
'SELECT' and Column 1 in the 'ORDER BY' i.e value2 of the second
'SELECT' is giving the correct order. But Column 2 in the 'ORDER BY'
i.e value2 in the first 'SELECT' and Column 2 in the 'ORDER BY' i.e
value1 in the second 'SELECT' is NOT giving the correct order.

Experiment 2:-
SQL File : PG_Exp_2.sql

Actual Output : PG_Exp_2.out

Created 'COLLATION' : CREATE COLLATION test_coll (
provider = icu, locale = 'ja-u-kr-digit-latn');

'SELECT' Queries : Same as 'Experiment 1'.

Expectation : All digits should come before all alphabets.

Seen Behavior : Matching with expectation. Column 1 in the
'ORDER BY' i.e value1 of the first 'SELECT' and Column 1 in the
'ORDER BY' i.e value2 of the second 'SELECT' is giving the correct
order. And Column 2 in the 'ORDER BY' i.e value2 in the first
'SELECT' and Column 2 in the 'ORDER BY' i.e value1 in the
second 'SELECT' is giving the correct order.

We did debug 'Experiment 1' and we find that:-
Whatever is the Column 1 in 'ORDER BY' gets correctly ordered,
because it uses abbreviated sort optimization due to which its data
datum gets converted to abbreviated datum using
"varstr_abbrev_convert()" function, and then the comparator
function selected is
"ssup->comparator = ssup_datum_unsigned_cmp()"
for sorting operation. But in case of column 2 in 'ORDER BY' (which
is showing incorrect result for 'Experiment 1') does not use
abbreviated sort optimization and here comparator function selected
is "ssup->comparator = varlenafastcmp_locale" -->
"strncoll_icu_utf8()", which appears, uses the third party ICU
library function for comparison and does not work as expected.

Need help in confirming why 'Experiment 1' is behaving as mentioned
above -
1. If our expectation of 'Experiment 1' is wrong?
2. Bug in abbreviated sort optimization?
3. Bug in third party comparator function "strncoll_icu_utf8()"?
4. Any other aspects which we are missing?
5. Or everything appears good?

PFA, the experiment files.

Thanks & Regards,
Nishant Sharma.
EnterpriseDB, Pune, India.

Attachment Content-Type Size
PG_Exp_1.sql application/octet-stream 1.0 KB
PG_Exp_2.sql application/octet-stream 1.0 KB
PG_Exp_1.out application/octet-stream 710 bytes
PG_Exp_2.out application/octet-stream 710 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-08-25 08:02:39 Re: Fixes a trivial bug in dumped parse/query/plan trees
Previous Message Zhijie Hou (Fujitsu) 2025-08-25 07:22:56 RE: Add support for specifying tables in pg_createsubscriber.