From: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com>, Teodor Sigaev <teodor(at)postgrespro(dot)ru> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, tgl Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Draft release notes for next week's releases |
Date: | 2016-03-28 07:55:50 |
Message-ID: | CAF4Au4w10NmS4wit98yLCTCwjVrkaMmScmYvNm8hO6PUjwRt6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 28, 2016 at 1:21 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Mon, Mar 28, 2016 at 12:08 AM, Oleg Bartunov <obartunov(at)gmail(dot)com>
> wrote:
> > Should we start thinking about ICU ? I compare Postgres with ICU and
> without
> > and found 27x improvement in btree index creation for russian strings.
> This
> > includes effect of abbreviated keys and ICU itself. Also, we'll get
> system
> > independent locale.
>
> I think we should. I want to develop a detailed proposal before
> talking about it more, though, because the idea is controversial.
>
> Did you use the FreeBSD ports patch? Do you have your own patch that
> you could share?
>
We'll post the patch. Teodor made something to get abbreviated keys work
as
I remember. I should say, that 27x improvement I got on my macbook. I will
check on linux.
>
> I'm not surprised that ICU is so much faster, especially now that
> UTF-8 is not a second class citizen (it's been possible to build ICU
> to specialize all its routines to handle UTF-8 for years now). As you
> may know, ICU supports partial sort keys, and sort key compression,
> which may have also helped:
> http://userguide.icu-project.org/collation/architecture
>
>
> That page also describes how binary sort keys are versioned, which
> allows them to be stored on disk. It says "A common example is the use
> of keys to build indexes in databases". We'd be crazy to trust Glibc
> strxfrm() to be stable *on disk*, but ICU already cares deeply about
> the things we need to care about, because it's used by other database
> systems like DB2, Firebird, and in some configurations SQLite [1].
>
> Glibc strxfrm() is not great with codepoints from the Cyrillic
> alphabet -- it seems to store 2 bytes per code-point in the primary
> weight level. So ICU might also do better in your test case for that
> reason.
>
Yes, I see on this page, that ICU is ~3 times faster for russian text.
http://site.icu-project.org/charts/collation-icu4c48-glibc
>
> [1]
> https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/icu/README.txt
> --
> Peter Geoghegan
>
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2016-03-28 08:06:25 | Re: Draft release notes for next week's releases |
Previous Message | Michael Paquier | 2016-03-28 07:54:18 | Re: Proposal: "Causal reads" mode for load balancing reads without stale data |