Re: Bizarre behavior of \w in a regular expression bracket construct

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bizarre behavior of \w in a regular expression bracket construct
Date: 2021-02-21 17:39:45
Message-ID: 3467328.1613929185@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> It looks like the interpretation of these other engines is that [\d-a]
> is the set of \d, the literal character "-", and the literal character
> "a". In other words, the - preceded by \d or \w (or any other character
> class, I guess?) loses its special meaning of identifying a character
> range.

Yeah. While I can see the attraction of being picky about this,
I can also see the attraction of being more compatible with other
engines. Should we relax this?

A quick experiment with perl shows that its opinion is "if the
atom before or after a potentially range-defining dash is a
character class, then take the dash as an ordinary character".
(This confirms Joel's result, and also I found that e.g. [3-\w]
treats the dash as a literal character.)

> This one I didn't understand:
>> ^([\W])$ | pg |

I think Joel just forgot to mark that as ERROR. It certainly
doesn't work in our engine today (though I'm nearly done with
a patch to fix that).

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniele Varrazzo 2021-02-21 18:05:03 Is a connection max lifetime useful in a connection pool?
Previous Message Alvaro Herrera 2021-02-21 16:06:51 Re: Bizarre behavior of \w in a regular expression bracket construct