Re: speed up unicode normalization quick check

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speed up unicode normalization quick check
Date: 2020-10-07 06:18:44
Message-ID: 20201007061844.GB30037@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 19, 2020 at 04:09:27PM -0700, Mark Dilger wrote:
> I am marking this ready for committer. I didn't object to the
> whitespace weirdness in your patch (about which `git apply`
> grumbles) since you seem to have done that intentionally. I have no
> further comments on the performance issue, since I don't have any
> other platforms at hand to test it on. Whichever committer picks
> this up can decide if the issue matters to them enough to punt it
> back for further performance testing.

About 0001, the new set of multipliers looks fine to me. Even if this
adds an extra item from 901 to 902 because this can be divided by 17
in kwlist_d.h, I also don't think that this is really much bothering
and. As mentioned, this impacts none of the other tables that are much
smaller in size, on top of coming back to normal once a new keyword
will be added. Being able to generate perfect hash functions for much
larger sets is a nice property to have. While on it, I also looked at
the assembly code with gcc -O2 for keywords.c & co and I have not
spotted any huge difference. So I'd like to apply this first if there
are no objections.

I have tested 0002 and 0003, that had better be merged together at the
end, and I can see performance improvements with MSVC and gcc similar
to what is being reported upthread, with 20~30% gains for simple
data sample using IS NFC/NFKC. That's cool.

Including unicode_normprops_table.h in what gets ignored with pgindent
is also fine at the end, even with the changes to make the output of
the structures generated more in-line with what pgindent generates.
One tiny comment I have is that I would have added an extra comment in
the unicode header generated to document the set of structures
generated for the perfect hash, but that's easy enough to add.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2020-10-07 06:27:13 Re: new heapcheck contrib module
Previous Message Masahiko Sawada 2020-10-07 05:54:04 Re: Resetting spilled txn statistics in pg_stat_replication