Re: Hostnames, IDNs, Punycode and Unicode Case Folding

From: Mike Cardwell <pgsql(at)lists(dot)grepular(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Date: 2014-12-30 00:26:11
Message-ID: 20141230002611.GB24297@glue.grepular.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

* on the Mon, Dec 29, 2014 at 05:01:44PM -0700, David G Johnston wrote:

>> CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames
>> (lower(punycode_encode(hostname)));
>>
>> That would prevent adding more than one representation for the same
>> hostname
>> to the column.
>
> Except two different hostname can resolve to the same
> punycode_encode(hostname) value

That's exactly what I'm taking advantage of...

> so the unique index won't work.

I think you misunderstand what I'm suggesing. I'm suggesting storing the
unicode version in the database, and then creating a unique index on the
result of the punycode function to prevent more than one representation
of the same hostname being inserted.

> It was also mentioned that using the Perl encoding function was
> non-performant; which is why caching the data into a memoization table has
> value.

It was non-performant because it involved unnecessarily starting a Perl
interpreter. Not because the algorithm its self is slow.

> WHERE lower(punycode_encode(hostname)) =
> > lower(punycode_encode('any-representation'))
> >
>
> I'm not for knowing the rules of punycode but I'm not seeing what value
> lower() provides here...

Case insensitive matching. So that "EXAMPLE.COM" = "example.com"

--
Mike Cardwell https://grepular.com https://emailprivacytester.com
OpenPGP Key 35BC AF1D 3AA2 1F84 3DC3 B0CF 70A5 F512 0018 461F
XMPP OTR Key 8924 B06A 7917 AAF3 DBB1 BF1B 295C 3C78 3EF1 46B4

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2014-12-30 00:26:40 Re: [GENERAL] Rollback on include error in psql
Previous Message Andrew Sullivan 2014-12-30 00:25:59 Re: Hostnames, IDNs, Punycode and Unicode Case Folding