From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: ts_locale.c: why no t_isalnum() test? |
Date: | 2022-10-19 22:12:47 |
Message-ID: | CADkLM=fgm4_A7b9_pXE=QPCB+JpxD4sTRue4SXKk9TvkB0LWig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 5, 2022 at 3:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I happened to wonder why various places are testing things like
>
> #define ISWORDCHR(c) (t_isalpha(c) || t_isdigit(c))
>
> rather than using an isalnum-equivalent test. The direct answer
> is that ts_locale.c/.h provides no such test function, which
> apparently is because there's not a lot of potential callers in
> the core code. However, both pg_trgm and ltree could benefit
> from adding one.
>
> There's no semantic hazard here: the documentation I consulted
> is all pretty explicit that is[w]alnum is true exactly when
> either is[w]alpha or is[w]digit are. For example, POSIX saith
>
> The iswalpha() and iswalpha_l() functions shall test whether wc is a
> wide-character code representing a character of class alpha in the
> current locale, or in the locale represented by locale, respectively;
> see XBD Locale.
>
> The iswdigit() and iswdigit_l() functions shall test whether wc is a
> wide-character code representing a character of class digit in the
> current locale, or in the locale represented by locale, respectively;
> see XBD Locale.
>
> The iswalnum() and iswalnum_l() functions shall test whether wc is a
> wide-character code representing a character of class alpha or digit
> in the current locale, or in the locale represented by locale,
> respectively; see XBD Locale.
>
> While I didn't try to actually measure it, these functions don't
> look remarkably cheap. Doing char2wchar() twice when we only need
> to do it once seems silly, and the libc functions themselves are
> probably none too cheap for multibyte characters either.
>
> Hence, I propose the attached. I got rid of some places that were
> unnecessarily checking pg_mblen before applying t_iseq(), too.
>
> regards, tom lane
>
>
I see this is already committed, but I'm curious, why do t_isalpha and
t_isdigit have the pair of /* TODO */ comments? This unfinished business
isn't explained anywhere in the file.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-10-19 22:39:32 | Re: ts_locale.c: why no t_isalnum() test? |
Previous Message | Peter Geoghegan | 2022-10-19 21:58:37 | Decoupling antiwraparound autovacuum from special rules around auto cancellation |