Re: ICU integration

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Devrim Gündüz <devrim(at)gunduz(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Doug Doole <ddoole(at)salesforce(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ICU integration
Date: 2016-09-12 18:04:06
Message-ID: CAM3SWZQ1uSbrVqmQAqLCtTTrM4Q47=9QByJALKnsyPAxdxJbcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 9, 2016 at 6:39 AM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> Looking back at my old emails, apparently ICU 5.0 and later include
> ucol_strcollUTF8() which avoids the need to convert UTF-8 characters
> to 16 bit before sorting. RHEL 6 has the older 4.2 version of ICU.

At the risk of stating the obvious, there is a reason why ICU
traditionally worked with UTF-16 natively. It's the same reason why
many OSes and application frameworks (e.g., Java) use UTF-16
internally, even though UTF-8 is much more popular on the web. Which
is: there are certain low-level optimizations possible that are not
possible with UTF-8.

I'm not saying that it would be just as good if we were to not use the
UTF-8 optimized stuff that ICU now has. My point is that it's not
useful to prejudge whether or not performance will be acceptable based
on a factor like this, which is ultimately just an implementation
detail. The ICU patch either performs acceptably as a substitute for
something like glibc, or it does not.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-12 18:05:33 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Previous Message Andres Freund 2016-09-12 17:58:11 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)