Re: Bizarre behavior of \w in a regular expression bracket construct

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bizarre behavior of \w in a regular expression bracket construct
Date: 2021-02-24 16:47:49
Message-ID: 20210224164749.GA10596@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Feb-23, Tom Lane wrote:

> * Create infrastructure to allow treating \w as a character class
> in its own right. (I did not expose [[:word:]] as a class name,
> though it would be a little more symmetric to do so; should we?)

Apparently [:word:] is a GNU extension (or at least a "bash-specific
character class"[1] but apparently Emacs also supports it?); all the
others are mandated by POSIX[2].

[1] https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
[2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

I think it'd be fine to expose [:word:] ...

> [1] https://www.regular-expressions.info/charclasssubtract.html

I had never heard of this subtraction thing. Nightmarish and confusing
syntax, but useful.

> + Also, the character class shorthands <literal>\D</literal>
> + and <literal>\W</literal> will match a newline regardless of this mode.
> + (Before <productname>PostgreSQL</productname> 14, they did not match
> + newlines in newline-sensitive mode.)

This seems an acceptable change to me, but then I only work here.

--
Álvaro Herrera 39°49'30"S 73°17'W

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-02-24 16:58:36 Re: REINDEX backend filtering
Previous Message John Naylor 2021-02-24 16:25:49 Re: [POC] verifying UTF-8 using SIMD instructions