Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Devrim Gündüz <devrim(at)gunduz(dot)org>, Jakob Egger <jakob(at)eggerapps(dot)at>, Tobias Bussmann <t(dot)bussmann(at)gmx(dot)net>
Subject: Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Date: 2016-08-11 16:56:25
Message-ID: CAM3SWZRVBUZ2pdcJ_Br7J+wzZv0m1m4sM=1uFgD+JWh8u8wgXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 11, 2016 at 4:22 AM, Palle Girgensohn <girgen(at)pingpong(dot)net> wrote:
> But in your strxfrm code in PostgreSQL, the keys are cached, and represented
> as int64:s if I remember correctly, so perhaps there is still a benefit
> using the abbreviated keys? More testing is required, I guess...

ICU's ucol_nextSortKeyPart() interface is faster for this, I believe,
and works because you only need the first sizeof(Datum) bytes (4 or 8
bytes). So, you have the right idea about using it (at least for the
abbreviated key stuff), but the second last argument needs to be
sizeof(Datum).

The whole point of the abbreviated key optimization is that you can
avoid pointer chasing during each and every comparison. Your test here
is invalid because it doesn't involved the reuse of the keys. Often,
during a sort, each item has to be compared about 20 times.

I've experimented with ICU, and know it works well with this. You
really need to create a scenario with a real sort, and all the
conditions I describe.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2016-08-11 16:57:09 Re: Heap WARM Tuples - Design Draft
Previous Message Ryan Pedela 2016-08-11 16:50:01 Re: 9.6 phrase search distance specification