Re: Draft release notes for next week's releases

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Teodor Sigaev <teodor(at)postgrespro(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, tgl Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Draft release notes for next week's releases
Date: 2016-03-28 07:55:50
Message-ID: CAF4Au4w10NmS4wit98yLCTCwjVrkaMmScmYvNm8hO6PUjwRt6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 28, 2016 at 1:21 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Mon, Mar 28, 2016 at 12:08 AM, Oleg Bartunov <obartunov(at)gmail(dot)com>
> wrote:
> > Should we start thinking about ICU ? I compare Postgres with ICU and
> without
> > and found 27x improvement in btree index creation for russian strings.
> This
> > includes effect of abbreviated keys and ICU itself. Also, we'll get
> system
> > independent locale.
>
> I think we should. I want to develop a detailed proposal before
> talking about it more, though, because the idea is controversial.
>
> Did you use the FreeBSD ports patch? Do you have your own patch that
> you could share?
>

We'll post the patch. Teodor made something to get abbreviated keys work
as
I remember. I should say, that 27x improvement I got on my macbook. I will
check on linux.

>
> I'm not surprised that ICU is so much faster, especially now that
> UTF-8 is not a second class citizen (it's been possible to build ICU
> to specialize all its routines to handle UTF-8 for years now). As you
> may know, ICU supports partial sort keys, and sort key compression,
> which may have also helped:
> http://userguide.icu-project.org/collation/architecture
>

>
> That page also describes how binary sort keys are versioned, which
> allows them to be stored on disk. It says "A common example is the use
> of keys to build indexes in databases". We'd be crazy to trust Glibc
> strxfrm() to be stable *on disk*, but ICU already cares deeply about
> the things we need to care about, because it's used by other database
> systems like DB2, Firebird, and in some configurations SQLite [1].
>
> Glibc strxfrm() is not great with codepoints from the Cyrillic
> alphabet -- it seems to store 2 bytes per code-point in the primary
> weight level. So ICU might also do better in your test case for that
> reason.
>

Yes, I see on this page, that ICU is ~3 times faster for russian text.
http://site.icu-project.org/charts/collation-icu4c48-glibc

>
> [1]
> https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/icu/README.txt
> --
> Peter Geoghegan
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-28 08:06:25 Re: Draft release notes for next week's releases
Previous Message Michael Paquier 2016-03-28 07:54:18 Re: Proposal: "Causal reads" mode for load balancing reads without stale data