From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Bizarre behavior of \w in a regular expression bracket construct |
Date: | 2021-02-24 16:47:49 |
Message-ID: | 20210224164749.GA10596@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2021-Feb-23, Tom Lane wrote:
> * Create infrastructure to allow treating \w as a character class
> in its own right. (I did not expose [[:word:]] as a class name,
> though it would be a little more symmetric to do so; should we?)
Apparently [:word:] is a GNU extension (or at least a "bash-specific
character class"[1] but apparently Emacs also supports it?); all the
others are mandated by POSIX[2].
[1] https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
[2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
I think it'd be fine to expose [:word:] ...
> [1] https://www.regular-expressions.info/charclasssubtract.html
I had never heard of this subtraction thing. Nightmarish and confusing
syntax, but useful.
> + Also, the character class shorthands <literal>\D</literal>
> + and <literal>\W</literal> will match a newline regardless of this mode.
> + (Before <productname>PostgreSQL</productname> 14, they did not match
> + newlines in newline-sensitive mode.)
This seems an acceptable change to me, but then I only work here.
--
Álvaro Herrera 39°49'30"S 73°17'W
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2021-02-24 16:58:36 | Re: REINDEX backend filtering |
Previous Message | John Naylor | 2021-02-24 16:25:49 | Re: [POC] verifying UTF-8 using SIMD instructions |