Quick Links

Re: Unicode normalization SQL functions

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unicode normalization SQL functions
Date:	2020-01-28 20:21:18
Message-ID:	43f13518-010a-8319-8013-f319522ea719@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2020-01-28 10:48, Daniel Verite wrote:
> I found a bug in unicode_is_normalized_quickcheck() which is
> triggered when the last codepoint of the string is beyond
> U+10000. On encountering it, it does:
> + if (is_supplementary_codepoint(ch))
> + p++;
> When ch is the last codepoint, it makes p point to
> the ending zero, but the subsequent p++ done by
> the for loop makes it miss the exit and go into over-reading.
>
> But anyway, what's the reason for skipping the codepoint
> following a codepoint outside of the BMP?

You're right, this didn't make any sense. Here is a new patch set with
that fixed.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v3-0001-Add-support-for-other-normal-forms-to-Unicode-nor.patch	text/plain	370.0 KB
v3-0002-Add-SQL-functions-for-Unicode-normalization.patch	text/plain	1.1 MB

In response to

Re: Unicode normalization SQL functions at 2020-01-28 09:48:45 from Daniel Verite

Responses

Re: Unicode normalization SQL functions at 2020-02-13 00:23:41 from Andreas Karlsson

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2020-01-28 20:29:18	Re: Removing pg_pltemplate and creating "trustable" extensions
Previous Message	Robert Haas	2020-01-28 20:08:39	Re: making the backend's json parser work in frontend code