Quick Links

Re: Bizarre behavior of \w in a regular expression bracket construct

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Bizarre behavior of \w in a regular expression bracket construct
Date:	2021-02-24 16:47:49
Message-ID:	20210224164749.GA10596@alvherre.pgsql
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2021-Feb-23, Tom Lane wrote:

> * Create infrastructure to allow treating \w as a character class
> in its own right. (I did not expose [[:word:]] as a class name,
> though it would be a little more symmetric to do so; should we?)

Apparently [:word:] is a GNU extension (or at least a "bash-specific
character class"[1] but apparently Emacs also supports it?); all the
others are mandated by POSIX[2].

[1] https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
[2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

I think it'd be fine to expose [:word:] ...

> [1] https://www.regular-expressions.info/charclasssubtract.html

I had never heard of this subtraction thing. Nightmarish and confusing
syntax, but useful.

> + Also, the character class shorthands <literal>\D</literal>
> + and <literal>\W</literal> will match a newline regardless of this mode.
> + (Before <productname>PostgreSQL</productname> 14, they did not match
> + newlines in newline-sensitive mode.)

This seems an acceptable change to me, but then I only work here.

--
Álvaro Herrera 39°49'30"S 73°17'W

In response to

Re: Bizarre behavior of \w in a regular expression bracket construct at 2021-02-23 17:15:29 from Tom Lane

Responses

Re: Bizarre behavior of \w in a regular expression bracket construct at 2021-02-24 17:11:51 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Julien Rouhaud	2021-02-24 16:58:36	Re: REINDEX backend filtering
Previous Message	John Naylor	2021-02-24 16:25:49	Re: [POC] verifying UTF-8 using SIMD instructions