Re: Unicode normalization SQL functions

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unicode normalization SQL functions
Date: 2020-02-13 00:23:41
Message-ID: 26150b35-240f-941c-e5a7-24f2d489b316@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/28/20 9:21 PM, Peter Eisentraut wrote:
> You're right, this didn't make any sense.  Here is a new patch set with
> that fixed.

Thanks for this patch. This is a feature which has been on my personal
todo list for a while and something which I have wished to have a couple
of times.

I took a quick look at the patch and here is some feedback:

A possible concern is increased binary size from the new tables for the
quickcheck but personally I think they are worth it.

A potential optimization would be to merge utf8_to_unicode() and
pg_utf_mblen() into one function in unicode_normalize_func() since
utf8_to_unicode() already knows length of the character. Probably not
worth it though.

It feels a bit wasteful to measure output_size in
unicode_is_normalized() since unicode_normalize() actually already knows
the length of the buffer, it just does not return it.

A potential optimization for the normalized case would be to abort the
quick check on the first maybe and normalize from that point on only. If
I can find the time I might try this out and benchmark it.

Nitpick: "split/\s*;\s*/, $line" in generate-unicode_normprops_table.pl
should be "split /\s*;\s*/, $line".

What about using else if in the code below for clarity?

+ if (check == UNICODE_NORM_QC_NO)
+ return UNICODE_NORM_QC_NO;
+ if (check == UNICODE_NORM_QC_MAYBE)
+ result = UNICODE_NORM_QC_MAYBE;

Remove extra space in the line below.

+ else if (quickcheck == UNICODE_NORM_QC_NO )

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-02-13 01:25:39 Re: [PATCH] libpq improvements and fixes
Previous Message Ranier Vilela 2020-02-12 22:55:32 [PATCH] libpq improvements and fixes