Re: [PATCH] Expand character set for ltree labels

From: Garen Torikian <gjtorikian(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Expand character set for ltree labels
Date: 2022-10-04 23:16:30
Message-ID: CAGXsc+-jhKJvSaqTWYa_PkrmA0ANWPfpte41ijwYpUjx7GNrHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

No, not quite.

Valid Punycode characters are `[A-Za-z0-9-]`. This proposal includes `-`,
as well as `#` and `;` for HTML entities.

I double-checked the RFC to see the valid Punycode characters and the set
above is indeed correct:
https://datatracker.ietf.org/doc/html/draft-ietf-idn-punycode-02#section-5

While it would be nice for ltree labels to support *any* printable
character, it can't because symbols like `!` and `%` already have special
meaning in the querying. This proposal leaves those as is and does not
depend on any existing special character.

On Tue, Oct 4, 2022 at 6:32 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:

> On Tue, Oct 04, 2022 at 12:54:46PM -0400, Garen Torikian wrote:
> > The punycode range of characters is the exact same set as the existing
> > ltree range, with the addition of a hyphen (-). Within this system, any
> > human language can be encoded using just A-Za-z0-9-.
>
> IIUC ASCII characters like '!' and '<' are valid Punycode characters, but
> even with your proposal, those wouldn't be allowed.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-10-04 23:49:52 Re: problems with making relfilenodes 56-bits
Previous Message Nathan Bossart 2022-10-04 22:54:20 Re: Move backup-related code to xlogbackup.c/.h