From: | Garen Torikian <gjtorikian(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [PATCH] Expand character set for ltree labels |
Date: | 2022-10-05 19:34:49 |
Message-ID: | CAGXsc+8ki-dAhX+it1xyyCk4zcMUX79ujVs-+xrrrHjzB5VKCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tom,
> Perhaps the docs are a bit unclear about that, but it's not
> restricted to ASCII alphanumerics. AFAICS the code will accept
> whatever iswalpha() and iswdigit() will accept in the database's
> default locale.
Sorry but I don't think that is correct. Here is the single
definition check of what constitutes a valid character:
https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129
As you can see, there are no `is_*` calls at all. Where in this contrib
package do you see `iswalpha`? Perhaps I missed it.
> That seems really pretty random.
Ok. I am trying to avoid a situation where other users may wish to use
other delimiters other than `-`, due to its commonplace presence in words
(eg., compound ones).
On Wed, Oct 5, 2022 at 2:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Garen Torikian <gjtorikian(at)gmail(dot)com> writes:
> > I am submitting a patch to expand the label requirements for ltree.
>
> > The current format is restricted to alphanumeric characters, plus _.
> > Unfortunately, for non-English labels, this set is insufficient.
>
> Hm? Perhaps the docs are a bit unclear about that, but it's not
> restricted to ASCII alphanumerics. AFAICS the code will accept
> whatever iswalpha() and iswdigit() will accept in the database's
> default locale. There's certainly work that could/should be done
> to allow use of not-so-default locales, but that's not specific
> to ltree. I'm not sure that doing an application-side encoding
> is attractive compared to just using that ability directly.
>
> If you do want to do application-side encoding, I'm unsure why
> punycode would be the choice anyway, as opposed to something
> that can fit in the existing restrictions.
>
> > On top of this, I added support for two more characters: # and ;, which
> are
> > used for HTML entities.
>
> That seems really pretty random.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-10-05 19:53:35 | ts_locale.c: why no t_isalnum() test? |
Previous Message | Andres Freund | 2022-10-05 19:08:29 | meson: Add support for building with precompiled headers |