Re: insensitive collations

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com>,"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: insensitive collations
Date: 2018-12-20 14:39:39
Message-ID: 222ac3b7-fbee-4e79-b051-84479ddb6c8c@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> I don't really find it "natural" for equality to consider obviously
> distinct values to be equal.

According to https://www.merriam-webster.com/dictionary/natural
"natural" has no less than 15 meanings. The first in the list is
"based on an inherent sense of right and wrong"
which I admit is not what we want to imply in this context.

The meaning that I was thinking about was close to definitions
4: "following from the nature of the one in question "
or 7: "having a specified character by nature "
or 13: "closely resembling an original : true to nature"

When postgres uses the comparison from a collation
with no modification whatsoever, it's true to that collation.
When it changes the result from equal to non-equal, it's not.
If a collation says that "ABC" = "abc" and postgres says, mmh, OK
thanks but I'll go with "ABC" != "abc", then that denatures the
collation, in the sense of:
"to deprive of natural qualities : change the nature of"
(https://www.merriam-webster.com/dictionary/denature)

Aside from that, I'd be +1 for "linguistic" as the opposite of
"bytewise", I think it tends to be easily understood when expressing
that a strcoll()-like function is used as opposed to a strcmp()-like
function.

I'm -1 for "deterministic" as a replacement for "bytewise". Even
if Unicode has choosen that term for exactly the behavior we're talking
about, it's heavily used in the more general sense of:
"given a particular input, will always produce the same output"
(quoted from https://en.wikipedia.org/wiki/Deterministic_algorithm)
which we very much expect from all our string comparisons no matter the
flags we may put on the collations. "bytewise" might be less academic
but it has less potential for wrong interpretations.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2018-12-20 15:19:11 Re: Online verification of checksums
Previous Message Alvaro Herrera 2018-12-20 14:29:05 Re: lock level for DETACH PARTITION looks sketchy