Re: Implementation of SASLprep for SCRAM-SHA-256

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implementation of SASLprep for SCRAM-SHA-256
Date: 2017-04-05 16:33:13
Message-ID: bcdd548d-04ce-69a2-1328-29627104d212@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2017 07:23 AM, Michael Paquier wrote:
> fore
>
> On Wed, Apr 5, 2017 at 7:05 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> I will continue tomorrow, but I wanted to report on what I've done so far.
>> Attached is a new patch version, quite heavily modified. Notable changes so
>> far:
>
> Great, thanks!
>
>> * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints.
>> IMHO this makes the tables easier to read (to a human), and they are also
>> packed slightly more tightly (see next two points), as you can fit more
>> codepoints in a 16-bit integer.
>
> Using directly codepoints is not much consistent with the existing
> backend, but for the sake of packing things more, OK.

Oh, I see, we already have similar functions in wchar.c.
unicode_to_utf8() and utf8_to_unicode(). We should probably move those
to src/common, rather than re-invent the wheel.

> pg_utf8_islegal() and pg_utf_mblen() should as well be moved in their
> own file I think, and wchar.c can use that.

Yeah..

>> * The list of characters excluded from recomposition is currently hard-coded
>> in utf_norm_generate.pl. However, that list is available in machine-readable
>> format, in file CompositionExclusions.txt. Since we're reading most of the
>> data from UnicodeData.txt, would be good to read the exclusion table from a
>> file, too.
>
> Ouch. Those are present here...
> http://www.unicode.org/reports/tr41/tr41-19.html#Exclusions
> Definitely it makes more sense to read them from a file.

Did that.

>> * SASLPrep specifies normalization form KC, but it also specifies that some
>> characters are mapped to space or nothing. Should do those mappings, too.
>
> Ah, right. Those ones are here:
> https://tools.ietf.org/html/rfc3454#appendix-B.1

Yep.

Attached is a new version. Notable changes since yesterday:

* Implemented the rest of the SASLPrep, mapping some characters to
spaces, leaving out others, and checking for prohibited characters and
bidirectional strings.

* Moved things around. There's now a separate directory,
src/common/unicode, which contains the perl scripts and the test code.
Those are not needed to build from source, as the pre-generated tables
are put in src/include/common. Similar to the scripts in
src/backend/utils/mb/Unicode, really.

* Renamed many things from utf_* to unicode_*, since they don't deal
with utf-8 input anymore.

This is starting to shape up, but still some cleanup work to do. I will
continue tomorrow..

- Heikki

Attachment Content-Type Size
implement-SASLprep-3.patch.gz application/gzip 68.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-04-05 16:53:58 Re: partitioned tables and contrib/sepgsql
Previous Message Tom Lane 2017-04-05 16:29:34 Re: Functions Immutable but not parallel safe?