Quick Links

Re: Problem in 'ORDER BY' of a column using a created collation?

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Nishant Sharma <nishant(dot)sharma(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem in 'ORDER BY' of a column using a created collation?
Date:	2025-09-26 13:15:51
Message-ID:	CA+TgmobWgZC-JkS4OKbxQsjuBh7mpKMPw5kxtcHJgN-7+vGgCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Aug 25, 2025 at 3:52 AM Nishant Sharma
<nishant(dot)sharma(at)enterprisedb(dot)com> wrote:
> We did debug 'Experiment 1' and we find that:-
> Whatever is the Column 1 in 'ORDER BY' gets correctly ordered,
> because it uses abbreviated sort optimization due to which its data
> datum gets converted to abbreviated datum using
> "varstr_abbrev_convert()" function, and then the comparator
> function selected is
> "ssup->comparator = ssup_datum_unsigned_cmp()"
> for sorting operation. But in case of column 2 in 'ORDER BY' (which
> is showing incorrect result for 'Experiment 1') does not use
> abbreviated sort optimization and here comparator function selected
> is "ssup->comparator = varlenafastcmp_locale" -->
> "strncoll_icu_utf8()", which appears, uses the third party ICU
> library function for comparison and does not work as expected.

Generally, there are two methods for performing collation-aware string
comparisons, and they are expected to produce equivalent results. For
libc collations, the first method is to use strcoll() or strcoll_l()
or similar to directly compared the strings, and the other is to use
strxfrm() or similar to convert the string to a normalize
representation which can then be compared using memcmp(). Equivalent
facilities exist for ICU; see collate_methods_icu and
collate_methods_icu_utf8 in pg_local_icu.c; one or the other of those
should be getting used here.

Even though these two methods are expected to produce equivalent
results, if there's a bug, they might not. That bug could either exist
in our code or it could exist in ICU; I suspect the latter is more
likely, but the former is certainly possible. What I think you want to
do is try to track down two specific strings where transforming them
via strnxfrm_icu() produces results that compare in one order using
memcmp(), but passing the same strings to strncoll_icu() or
strncoll_icu_utf8() -- whichever is appropriate -- produces a
different result. If you're able to find such strings, then you can
probably rephrase those examples in terms of whatever underlying
functions strncoll_icu() and strxfrm_icu() are calling and we can
maybe report the problem to the ICU maintainers. If no such strings
seem to exist, then there's probably a problem in the PostgreSQL code
someplace, and you should try to figure out why we're getting the
wrong answer despite ICU apparently doing the right thing.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Problem in 'ORDER BY' of a column using a created collation? at 2025-08-25 07:51:45 from Nishant Sharma

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Álvaro Herrera	2025-09-26 13:22:48	Re: refactor backend type lists
Previous Message	Ashutosh Bapat	2025-09-26 12:44:28	Re: Report bytes and transactions actually sent downtream