Re: speed up unicode normalization quick check

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speed up unicode normalization quick check
Date: 2020-10-08 06:48:23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 07, 2020 at 03:18:44PM +0900, Michael Paquier wrote:
> About 0001, the new set of multipliers looks fine to me. Even if this
> adds an extra item from 901 to 902 because this can be divided by 17
> in kwlist_d.h, I also don't think that this is really much bothering
> and. As mentioned, this impacts none of the other tables that are much
> smaller in size, on top of coming back to normal once a new keyword
> will be added. Being able to generate perfect hash functions for much
> larger sets is a nice property to have. While on it, I also looked at
> the assembly code with gcc -O2 for keywords.c & co and I have not
> spotted any huge difference. So I'd like to apply this first if there
> are no objections.

I looked at this one again today, and applied it. I looked at what
MSVC compiler was able to do in terms of optimizations with
shift-and-add for multipliers, and it is by far not as good as gcc or
clang, applying imul for basically all the primes we could use for the
perfect hash generation.

> I have tested 0002 and 0003, that had better be merged together at the
> end, and I can see performance improvements with MSVC and gcc similar
> to what is being reported upthread, with 20~30% gains for simple
> data sample using IS NFC/NFKC. That's cool.

For these two, I have merged both together and did some adjustments as
per the attached. Not many tweaks, mainly some more comments for the
unicode header files as the number of structures generated gets
higher. FWIW, with the addition of the two hash tables,
libpgcommon_srv.a grows from 1032600B to 1089240B, which looks like a
small price to pay for the ~30% performance gains with the quick

Attachment Content-Type Size
uni-norm-hash-v5.patch text/x-diff 107.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-08 07:03:26 Re: shared-memory based stats collector
Previous Message 2020-10-08 06:37:39 RE: [Patch] Optimize dropping of relation buffers using dlist