Quick Links

Re: Password identifiers, protocol aging and SCRAM protocol

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Julian Markwort <julian(dot)markwort(at)uni-muenster(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Valery Popov <v(dot)popov(at)postgrespro(dot)ru>
Subject:	Re: Password identifiers, protocol aging and SCRAM protocol
Date:	2017-01-18 05:46:16
Message-ID:	CAB7nPqTQFMDcUL1u+5PSKfOXbws2vFYQa_K=sRCeGism5LL_uQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 20, 2016 at 10:47 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> And Heikki has mentioned me that he'd prefer not having an extra
> dependency for the normalization, which is LGPL-licensed by the way.
> So I have looked at the SASLprep business to see what should be done
> to get a complete implementation in core, completely independent of
> anything known.
>
> The first thing is to be able to understand in the SCRAM code if a
> string is UTF-8 or not, and this code is in src/common/. pg_wchar.c
> offers a set of routines exactly for this purpose, which is built with
> libpq but that's not available for src/common/. So instead of moving
> all the file, I'd like to create a new file in src/common/utf8.c which
> includes pg_utf_mblen() and pg_utf8_islegal(). On top of that I think
> that having a routine able to check a full string would be useful for
> many users, as pg_utf8_islegal() can only check one set of characters.
> If the password string is found to be of UTF-8 format, SASLprepare is
> applied. If not, the string is copied as-is with perhaps unexpected
> effects for the client But he's in trouble already if client is not
> using UTF-8.
>
> Then comes the real business... Note that's my first time touching
> encoding, particularly UTF-8 in depth, so please be nice. I may write
> things that are incorrect or sound so from here :)
>
> The second thing is the normalization itself. Per RFC4013, NFKC needs
> to be applied to the string. The operation is described in [1]
> completely, and it is named as doing 1) a compatibility decomposition
> of the bytes of the string, followed by 2) a canonical composition.
>
> About 1). The compatibility decomposition is defined in [2], "by
> recursively applying the canonical and compatibility mappings, then
> applying the canonical reordering algorithm". Canonical and
> compatibility mapping are some data available in UnicodeData.txt, the
> 6th column of the set defined in [3] to be precise. The meaning of the
> decomposition mappings is defined in [2] as well. The canonical
> decomposition is basically to look for a given UTF-8 character, and
> then apply the multiple characters resulting in its new shape. The
> compatibility mapping should as well be applied, but [5], a perl tool
> called charlint.pl doing this normalization work, does not care about
> this phase... Do we?
>
> About 2)... Once the decomposition has been applied, those bytes need
> to be recomposed using the Canonical_Combining_Class field of
> UnicodeData.txt in [3], which is the 3rd column of the set. Its values
> are defined in [4]. An other interesting thing, charlint.pl [5] does
> not care about this phase. I am wondering if we should as well not
> just drop this part as well...
>
> Once 1) and 2) are done, NKFC is complete, and so is SASLPrepare.
>
> So what we need from Postgres side is a mapping table to, having the
> following fields:
> 1) Hexa sequence of UTF8 character.
> 2) Its canonical combining class.
> 3) The kind of decomposition mapping if defined.
> 4) The decomposition mapping, in hexadecimal format.
> Based on what I looked at, either perl or python could be used to
> process UnicodeData.txt and to generate a header file that would be
> included in the tree. There are 30k entries in UnicodeData.txt, 5k of
> them have a mapping, so that will result in many tables. One thing to
> improve performance would be to store the length of the table in a
> static variable, order the entries by their hexadecimal keys and do a
> dichotomy lookup to find an entry. We could as well use more fancy
> things like a set of tables using a Radix tree using decomposed by
> bytes. We should finish by just doing one lookup of the table for each
> character sets anyway.
>
> In conclusion, at this point I am looking for feedback regarding the
> following items:
> 1) Where to put the UTF8 check routines and what to move.
> 2) How to generate the mapping table using UnicodeData.txt. I'd think
> that using perl would be better.
> 3) The shape of the mapping table, which depends on how many
> operations we want to support in the normalization of the strings.
> The decisions for those items will drive the implementation in one
> sense or another.
>
> [1]: http://www.unicode.org/reports/tr15/#Description_Norm
> [2]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Character_Decomposition_Mappings
> [3]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
> [4]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Canonical_Combining_Class_Values
> [5]: https://www.w3.org/International/charlint/
>
> Heikki, others, thoughts?

FWIW, this patch is on a "waiting on author" state and that's right.
As the discussion on SASLprepare() and the decisions regarding the way
to implement it, or at least have it, are still pending, I am not
planning to move on with any implementation until we have a plan about
what to do. Just using libidn (LGPL) for a first shot is rather
painless but... I am not alone here.
--
Michael

In response to

Re: Password identifiers, protocol aging and SCRAM protocol at 2016-12-20 01:47:06 from Michael Paquier

Responses

Re: Password identifiers, protocol aging and SCRAM protocol at 2017-01-23 03:56:42 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rushabh Lathia	2017-01-18 06:01:25	Re: Gather Merge
Previous Message	Michael Paquier	2017-01-18 05:41:03	Re: [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG