Re: case insensitive collation of Greek's sigma

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: case insensitive collation of Greek's sigma
Date: 2021-12-01 19:49:24
Message-ID: 1989905.1638388164@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> writes:
> Running lower() like this is really the wrong thing to do. We should be
> doing "case folding" instead, which normalizes these differences for the
> purpose of case-insensitive comparisons.

That just begs the question: if tolower (or towlower) isn't the
appropriate API, what is? Perhaps ICU has something for a more
generalized notion of case-similarity, but I'm not aware of any such
thing in the POSIX API.

BTW, I think it's only accidental that the regex example shown upthread
gets the right answer. In that example, what's happening is that we
consider a letter in a case-insensitive regex to match itself, or
tolower() of itself, or toupper() of itself. Both σ and ς have Σ
as toupper() so they both work. But if you'd written Σ in the regex,
only one of σ and ς would match that as a data character. (Haven't
actually tested this, but given the way the code works I'm pretty
sure it's so.) Again, it's hard to see how to do better atop a POSIX
locale library.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-12-01 19:52:58 Re: INSERT ... ON CONFLICT doesn't work
Previous Message Jenda Krynicky 2021-12-01 19:43:50 Re: INSERT ... ON CONFLICT doesn't work