Re: Unicode normalization SQL functions

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: Unicode normalization SQL functions
Date: 2020-03-26 07:25:46
Message-ID: 7052cc8f-0164-72a8-d9a4-fd32066c938e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-03-24 10:20, Peter Eisentraut wrote:
> Now I have some concerns about the size of the new table in
> unicode_normprops_table.h, and the resulting binary size. At the very
> least, we should probably make that #ifndef FRONTEND or something like
> that so libpq isn't bloated by it unnecessarily. Perhaps there is a
> better format for that table? Any ideas?

I have figured this out. New patch is attached.

First, I have added #ifndef FRONTEND, as mentioned above, so libpq isn't
bloated. Second, I have changed the lookup structure to a bitfield, so
each entry is only 32 bits instead of 64. Third, I have dropped the
quickcheck tables for the NFD and NFKD forms. Those are by far the
biggest tables, and you still get okay performance if you do the
normalization check the long way, since we don't need the recomposition
step on those cases, which is by far the slowest part. The main use
case of all of this, I expect, is to check for NFC normalization, so
it's okay if the other variants are not optimized to the same extent.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v4-0001-Add-SQL-functions-for-Unicode-normalization.patch text/plain 224.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-26 07:34:58 Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Previous Message Surafel Temesgen 2020-03-26 07:22:26 Re: A rather hackish POC for alternative implementation of WITH TIES