Re: Unicode normalization SQL functions

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unicode normalization SQL functions
Date: 2020-01-28 20:21:18
Message-ID: 43f13518-010a-8319-8013-f319522ea719@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-01-28 10:48, Daniel Verite wrote:
> I found a bug in unicode_is_normalized_quickcheck() which is
> triggered when the last codepoint of the string is beyond
> U+10000. On encountering it, it does:
> + if (is_supplementary_codepoint(ch))
> + p++;
> When ch is the last codepoint, it makes p point to
> the ending zero, but the subsequent p++ done by
> the for loop makes it miss the exit and go into over-reading.
>
> But anyway, what's the reason for skipping the codepoint
> following a codepoint outside of the BMP?

You're right, this didn't make any sense. Here is a new patch set with
that fixed.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v3-0001-Add-support-for-other-normal-forms-to-Unicode-nor.patch text/plain 370.0 KB
v3-0002-Add-SQL-functions-for-Unicode-normalization.patch text/plain 1.1 MB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-01-28 20:29:18 Re: Removing pg_pltemplate and creating "trustable" extensions
Previous Message Robert Haas 2020-01-28 20:08:39 Re: making the backend's json parser work in frontend code