From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Quan Zongliang <quanzongliang(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: improve Chinese locale performance |
Date: | 2013-08-01 16:09:45 |
Message-ID: | CA+TgmoaaGa3HyYmMKFgc4m2Cps8Vv7L8534-89D_AaA5YS1CqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jul 28, 2013 at 5:39 AM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> On Tue, Jul 23, 2013 at 10:34:21AM -0400, Robert Haas wrote:
>> I pretty much lost interest in ICU upon reading that they use UTF-16
>> as their internal format.
>>
>> http://userguide.icu-project.org/strings#TOC-Strings-in-ICU
>
> The UTF-8 support has been steadily improving:
>
> For example, icu::Collator::compareUTF8() compares two UTF-8 strings
> incrementally, without converting all of the two strings to UTF-16 if
> there is an early base letter difference.
>
> http://userguide.icu-project.org/strings/utf-8
>
> For all other encodings you should be able to use an iterator. As to
> performance I have no idea.
>
> The main issue with strxfrm() is its lame API. If it supported
> returning prefixes you'd be set, but as it is you need >10MB of memory
> just to transform a 10MB string, even if only the first few characers
> would be enough to sort...
Yep, definitely. And by ">10MB" you mean ">90MB", at least on my Mac,
which is really outrageous.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2013-08-01 16:15:19 | pg_basebackup vs. Windows and tablespaces |
Previous Message | Alvaro Herrera | 2013-08-01 15:32:21 | Re: new "row-level lock" error messages |