Re: ICU integration

From: Doug Doole <ddoole(at)salesforce(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ICU integration
Date: 2016-09-07 17:32:35
Message-ID: CAP6UvaMTJYCxSBqhOnMwTS-vu=u7wvut-3k6TQ4eddtnSd4a1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> This isn't a problem for Postgres, or at least wouldn't be right now,
> because we don't have case insensitive collations.

I was wondering if Postgres might be that way. It does avoid the RI
constraint problem, but there are still troubles with range based
predicates. (My previous project wanted case/accent insensitive collations,
so we got to deal with it all.)

> So, we use a strcmp()/memcmp() tie-breaker when strcoll() indicates
> equality, while also making the general notion of text equality actually
> mean binary equality.

We used a similar tie breaker in places. (e.g. Index keys needed to be
identical, not just equal. We also broke ties in sort to make its behaviour
more deterministic.)

I would like to get case insensitive collations some day, and was
> really hoping that ICU would help. That being said, the need for a
> strcmp() tie-breaker makes that hard. Oh well.
>

Prior to adding ICU to my previous project, it had the assumption that
equal meant identical as well. It turned out to be a lot easier to break
this assumption than I expected, but that code base had religiously used
its own string comparison function for user data - strcmp()/memcmp() was
never called for user data. (I don't know if the same can be said for
Postgres.) We found that very few places needed to be aware of values that
were equal but not identical. (Index and sort were the big two.)

Hopefully Postgres will be the same.

--
Doug Doole

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-09-07 17:35:18 Re: Optimization for lazy_scan_heap
Previous Message Alvaro Herrera 2016-09-07 17:29:01 Re: SELECT FOR UPDATE regression in 9.5