From: | Nishant Sharma <nishant(dot)sharma(at)enterprisedb(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Problem in 'ORDER BY' of a column using a created collation? |
Date: | 2025-08-25 07:51:45 |
Message-ID: | CADrsxdbyWg75xV6sBeVCw76UE38JTW=1GJha4GAWP7jB0AK1ow@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
We encountered a strange behavior in ordering a column using a
created collation. Here are the experiments details:
Experiment 1:-
SQL File : PG_Exp_1.sql
Actual Output : PG_Exp_1.out
Created COLLATION : CREATE COLLATION test_coll (
provider = icu, locale = 'ja-u-kr-latn-digit');
'SELECT' Queries :
SELECT * FROM test_table ORDER BY value1 COLLATE
test_coll, value2 COLLATE test_coll;
SELECT * FROM test_table ORDER BY value2 COLLATE
test_coll, value1 COLLATE test_coll;
Expectation : All alphabets should come before all digits.
Seen Behavior : Column 1 in the 'ORDER BY' i.e value1 of the first
'SELECT' and Column 1 in the 'ORDER BY' i.e value2 of the second
'SELECT' is giving the correct order. But Column 2 in the 'ORDER BY'
i.e value2 in the first 'SELECT' and Column 2 in the 'ORDER BY' i.e
value1 in the second 'SELECT' is NOT giving the correct order.
Experiment 2:-
SQL File : PG_Exp_2.sql
Actual Output : PG_Exp_2.out
Created 'COLLATION' : CREATE COLLATION test_coll (
provider = icu, locale = 'ja-u-kr-digit-latn');
'SELECT' Queries : Same as 'Experiment 1'.
Expectation : All digits should come before all alphabets.
Seen Behavior : Matching with expectation. Column 1 in the
'ORDER BY' i.e value1 of the first 'SELECT' and Column 1 in the
'ORDER BY' i.e value2 of the second 'SELECT' is giving the correct
order. And Column 2 in the 'ORDER BY' i.e value2 in the first
'SELECT' and Column 2 in the 'ORDER BY' i.e value1 in the
second 'SELECT' is giving the correct order.
We did debug 'Experiment 1' and we find that:-
Whatever is the Column 1 in 'ORDER BY' gets correctly ordered,
because it uses abbreviated sort optimization due to which its data
datum gets converted to abbreviated datum using
"varstr_abbrev_convert()" function, and then the comparator
function selected is
"ssup->comparator = ssup_datum_unsigned_cmp()"
for sorting operation. But in case of column 2 in 'ORDER BY' (which
is showing incorrect result for 'Experiment 1') does not use
abbreviated sort optimization and here comparator function selected
is "ssup->comparator = varlenafastcmp_locale" -->
"strncoll_icu_utf8()", which appears, uses the third party ICU
library function for comparison and does not work as expected.
Need help in confirming why 'Experiment 1' is behaving as mentioned
above -
1. If our expectation of 'Experiment 1' is wrong?
2. Bug in abbreviated sort optimization?
3. Bug in third party comparator function "strncoll_icu_utf8()"?
4. Any other aspects which we are missing?
5. Or everything appears good?
PFA, the experiment files.
Thanks & Regards,
Nishant Sharma.
EnterpriseDB, Pune, India.
Attachment | Content-Type | Size |
---|---|---|
PG_Exp_1.sql | application/octet-stream | 1.0 KB |
PG_Exp_2.sql | application/octet-stream | 1.0 KB |
PG_Exp_1.out | application/octet-stream | 710 bytes |
PG_Exp_2.out | application/octet-stream | 710 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-08-25 08:02:39 | Re: Fixes a trivial bug in dumped parse/query/plan trees |
Previous Message | Zhijie Hou (Fujitsu) | 2025-08-25 07:22:56 | RE: Add support for specifying tables in pg_createsubscriber. |