Re: ts_locale.c: why no t_isalnum() test?

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: ts_locale.c: why no t_isalnum() test?
Date: 2022-10-19 22:12:47
Message-ID: CADkLM=fgm4_A7b9_pXE=QPCB+JpxD4sTRue4SXKk9TvkB0LWig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 5, 2022 at 3:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I happened to wonder why various places are testing things like
>
> #define ISWORDCHR(c) (t_isalpha(c) || t_isdigit(c))
>
> rather than using an isalnum-equivalent test. The direct answer
> is that ts_locale.c/.h provides no such test function, which
> apparently is because there's not a lot of potential callers in
> the core code. However, both pg_trgm and ltree could benefit
> from adding one.
>
> There's no semantic hazard here: the documentation I consulted
> is all pretty explicit that is[w]alnum is true exactly when
> either is[w]alpha or is[w]digit are. For example, POSIX saith
>
> The iswalpha() and iswalpha_l() functions shall test whether wc is a
> wide-character code representing a character of class alpha in the
> current locale, or in the locale represented by locale, respectively;
> see XBD Locale.
>
> The iswdigit() and iswdigit_l() functions shall test whether wc is a
> wide-character code representing a character of class digit in the
> current locale, or in the locale represented by locale, respectively;
> see XBD Locale.
>
> The iswalnum() and iswalnum_l() functions shall test whether wc is a
> wide-character code representing a character of class alpha or digit
> in the current locale, or in the locale represented by locale,
> respectively; see XBD Locale.
>
> While I didn't try to actually measure it, these functions don't
> look remarkably cheap. Doing char2wchar() twice when we only need
> to do it once seems silly, and the libc functions themselves are
> probably none too cheap for multibyte characters either.
>
> Hence, I propose the attached. I got rid of some places that were
> unnecessarily checking pg_mblen before applying t_iseq(), too.
>
> regards, tom lane
>
>
I see this is already committed, but I'm curious, why do t_isalpha and
t_isdigit have the pair of /* TODO */ comments? This unfinished business
isn't explained anywhere in the file.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-10-19 22:39:32 Re: ts_locale.c: why no t_isalnum() test?
Previous Message Peter Geoghegan 2022-10-19 21:58:37 Decoupling antiwraparound autovacuum from special rules around auto cancellation