Re: Character classes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: geert(dot)lobbestael(at)gmail(dot)com
Cc: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Re: Character classes
Date: 2019-05-20 18:06:37
Message-ID: 24386.1558375597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

PG Doc comments form <noreply(at)postgresql(dot)org> writes:
> On https://www.postgresql.org/docs/11/functions-matching.html paragraph
> 9.7.3.2. Bracket Expressions says "Standard character class names are:
> alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper,
> xdigit". The class "ascii" exists, but is not mentioned (probably a
> combination of some of the other classes). Are there any other classes?

Hm, fair question. I think the text means to say that these are the
character class names required by the POSIX regexp spec, which is
accurate. A look into our src/backend/regex/regc_locale.c will show
you that we also implement "ascii", and no others. That probably ought
to be documented.

> Do they work only for ASCII characters (e.g. '\u00A0' is not picked up
> by '[:blank:]')?

The POSIX ones are implemented by calling the C library, so it's whatever
the ctype.h and wctype.h functions think is appropriate for your LC_CTYPE
setting.

The 20-year-old reference in our text to ctype(3) seems rather unhelpful
today; in the first place, there's no such man page on my Linux systems,
and in the second place, wctype(3) is more important if it exists, and
in the third place what a reader actually wants to know is that this
is controlled by the LC_CTYPE server parameter. It'd likely be better
to dump the man-page reference altogether and instead point readers to
our "Locale Support" chapter.

regards, tom lane

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Thomas Munro 2019-05-21 09:18:58 Re: Character classes
Previous Message PG Doc comments form 2019-05-20 16:37:00 Character classes