insensitive collations

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: insensitive collations
Date: 2018-12-18 21:36:51
Message-ID: 1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

With various patches and discussions around collations going on, I
figured I'd send in my in-progress patch for insensitive collations.

This adds a flag "insensitive" to collations. Such a collation disables
various optimizations that assume that strings are equal only if they
are byte-wise equal. That then allows use cases such as
case-insensitive or accent-insensitive comparisons or handling of
strings with different Unicode normal forms.

So this doesn't actually make the collation case-insensitive or
anything, it just allows a library-provided collation that is, say,
case-insensitive to actually work that way. So maybe "insensitive"
isn't the right name for this flag, but we can think about that.

The jobs of this patch, aside from some DDL extensions, are to track
collation assignment in plan types whether they have so far been
ignored, and then make the various collation-aware functions take the
insensitive flag into account. In comparison functions this just means
skipping past the memcmp() optimizations. In hashing functions, this
means converting the string to a sort key (think strxfrm()) before hashing.

Various pieces are incomplete, but the idea should be clear from this.
I have only implemented the ICU implementation in hashtext(); the libc
provider branch needs to be added (or maybe we won't want to). All the
changes around the "name" type haven't been taken into account. Foreign
key support (see ri_GenerateQualCollation()) needs to be addressed.
More tests for all the different plans need to be added. But in
principle it works quite well, as you can see in the tests added so far.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v1-0001-Insensitive-collations.patch text/plain 71.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-12-18 21:49:17 still use IndexIsValid() etc. macros?
Previous Message Alvaro Herrera 2018-12-18 21:20:53 Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)