Re: Implementation of SASLprep for SCRAM-SHA-256

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implementation of SASLprep for SCRAM-SHA-256
Date: 2017-04-07 02:30:36
Message-ID: CAB7nPqTbECUQfWSOsagf83LO4n4LAYVxvmB96_JPs4EoOT0X=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 7, 2017 at 2:47 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 04/06/2017 08:42 PM, Heikki Linnakangas wrote:
>>> There is for example this portion in the new tables:
>>> +static const Codepoint prohibited_output_chars[] =
>>> +{
>>> + 0xD800, 0xF8FF, /* C.3, C.5 */
>>>
>>> ----- Start Table C.5 -----
>>> D800-DFFF; [SURROGATE CODES]
>>> ----- End Table C.5 -----
>>> This indicates a range of values. Wouldn't it be better to split this
>>> table in two, one for the range of codepoints and another one with the
>>> single entries?
>>
>> I considered that, but there are relatively few singular codepoints in
>> the tables, so it wouldn't save much space. In this patch, singular
>> codepoints are represented by a range like "0x3000, 0x3000".

I am really wondering if this should not reflect the real range
reported by the RFC. I understand that you have grouped things to save
a couple of bytes, but that would protect from any updates of the
codepoints within those ranges (unlikely to happen I agree).

>>> + 0x1D173, 0x1D17A, /* C.2.2 */
>>> This is for musical symbols. It seems to me that checking for a range
>>> is what is intended.
>>
>> Can you elaborate?
>
> Oh, I think I understand the confusion now. All the arrays represent
> codepoint ranges, not singular codepoints. I renamed them to "*_ranges", to
> make that more clear.

Thanks! Yes I got confused by the name.

+/* Is the given Unicode codepoint in the given table? */
+#define IS_CODE_IN_TABLE(code, map) is_code_in_table(code, map, lengthof(map))
[...]
+static bool
+is_code_in_table(pg_wchar code, const pg_wchar *map, int mapsize)
+{
+ Assert(mapsize % 2 == 0);
+
+ if (code < map[0] || code > map[mapsize - 1])
+ return false;
+
+ if (bsearch(&code, map, mapsize / 2, sizeof(pg_wchar) * 2,
+ codepoint_range_cmp))
+ return true;
+ else
+ return false;
+}
Those could be renamed to range_table as well to avoid more confusion.

> The SASLprep implementation depends on the UTF-8 functions from
> src/backend/utils/mb/wchar.c. So to use it, you must also compile and link
> that. That doesn't change anything for the current users of these
> functions, the backend and libpq, as they both already link with wchar.o.
> It would be good to move those functions into a separate file in
> src/commmon, but I'll leave that for another day.

Fine for me.

You may want to add a .gitignore in src/common/unicode for norm_test
and norm_test_table.h.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-04-07 02:32:21 Re: logical replication apply to run with sync commit off by default
Previous Message Craig Ringer 2017-04-07 02:30:25 Re: Faster methods for getting SPI results (460% improvement)