Re: [PATCH] Expand character set for ltree labels

From: Garen Torikian <gjtorikian(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Expand character set for ltree labels
Date: 2022-10-05 19:34:49
Message-ID: CAGXsc+8ki-dAhX+it1xyyCk4zcMUX79ujVs-+xrrrHjzB5VKCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

> Perhaps the docs are a bit unclear about that, but it's not
> restricted to ASCII alphanumerics. AFAICS the code will accept
> whatever iswalpha() and iswdigit() will accept in the database's
> default locale.

Sorry but I don't think that is correct. Here is the single
definition check of what constitutes a valid character:
https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129

As you can see, there are no `is_*` calls at all. Where in this contrib
package do you see `iswalpha`? Perhaps I missed it.

> That seems really pretty random.

Ok. I am trying to avoid a situation where other users may wish to use
other delimiters other than `-`, due to its commonplace presence in words
(eg., compound ones).

On Wed, Oct 5, 2022 at 2:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Garen Torikian <gjtorikian(at)gmail(dot)com> writes:
> > I am submitting a patch to expand the label requirements for ltree.
>
> > The current format is restricted to alphanumeric characters, plus _.
> > Unfortunately, for non-English labels, this set is insufficient.
>
> Hm? Perhaps the docs are a bit unclear about that, but it's not
> restricted to ASCII alphanumerics. AFAICS the code will accept
> whatever iswalpha() and iswdigit() will accept in the database's
> default locale. There's certainly work that could/should be done
> to allow use of not-so-default locales, but that's not specific
> to ltree. I'm not sure that doing an application-side encoding
> is attractive compared to just using that ability directly.
>
> If you do want to do application-side encoding, I'm unsure why
> punycode would be the choice anyway, as opposed to something
> that can fit in the existing restrictions.
>
> > On top of this, I added support for two more characters: # and ;, which
> are
> > used for HTML entities.
>
> That seems really pretty random.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-10-05 19:53:35 ts_locale.c: why no t_isalnum() test?
Previous Message Andres Freund 2022-10-05 19:08:29 meson: Add support for building with precompiled headers