Re: strcmp() tie-breaker for identical ICU-collated strings

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strcmp() tie-breaker for identical ICU-collated strings
Date: 2017-06-09 15:05:53
Message-ID: 313838f0-49be-17cb-a812-ab14ee9e8ff9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/9/17 10:31, Robert Haas wrote:
> + * In some locales strcoll() can claim that nonidentical strings are
> + * equal. Believing that would be bad news for a number of reasons,
> + * so we follow Perl's lead and sort "equal" strings according to
> + * strcmp().
>
> Again, however, the reasons why believing it would be bad news are not
> enumerated. It is merely asserted that there is more than one such
> reason.

I suspect that there were just issues that haven't been thought through
yet, including hashing.

More generally, the code's receptiveness to internationalization issues
is ever expanding. Early code probably also thought that using
multibyte characters or non-C locales was bad news. Over time, we have
worked those issues out. This might be just be one more.

> So, what's special about text that it can never report two
> non-byte-for-byte values as equal? And could we consider changing
> that, so that users can select an ICU collator and get exactly the
> behavior ICU delivers, without the extra tiebreak?

I don't think there is anything special. We just need to work through
the details.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-06-09 15:12:06 Re: strcmp() tie-breaker for identical ICU-collated strings
Previous Message Jeff Janes 2017-06-09 14:47:37 Re: List of hostaddrs not supported