Re: Some regular-expression performance hacking

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-02-19 15:26:20
Message-ID: 3028106.1613748380@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Thu, Feb 18, 2021, at 19:53, Tom Lane wrote:
>> (Having said that, I can't help noticing that a very large fraction
>> of those usages look like, eg, "[\w\W]". It seems to me that that's
>> a very expensive and unwieldy way to spell ".". Am I missing
>> something about what that does in Javascript?)

> I think this is a non-POSIX hack to match any character, including newlines,
> which are not included unless the "s" flag is set.

> "foo\nbar".match(/([\w\W]+)/)[1];
> "foo
> bar"

Oooh, that's very interesting. I guess the advantage of that over using
the 's' flag is that you can have different behaviors at different places
in the same regex.

I was just wondering about this last night in fact, while hacking on
the code to get it to accept \W etc in bracket expressions. I see that
right now, our code thinks that NLSTOP mode ('n' switch, the opposite
of 's') should cause \W \D \S to not match newline. That seems a little
weird, not least because \S should probably be different from the other
two, and it isn't. And now we see it'd mean that you couldn't use the 'n'
switch to duplicate Javascript's default behavior in this area. Should we
change it? (I wonder what Perl does.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-02-19 15:28:02 Re: Problem with accessing TOAST data in stored procedures
Previous Message Konstantin Knizhnik 2021-02-19 15:19:00 Re: Problem with accessing TOAST data in stored procedures