Re: Some regular-expression performance hacking

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-02-19 12:45:34
Message-ID: b16bccc0-3f98-47ad-81aa-699a3e00630d@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 18, 2021, at 19:53, Tom Lane wrote:
>(Having said that, I can't help noticing that a very large fraction
>of those usages look like, eg, "[\w\W]". It seems to me that that's
>a very expensive and unwieldy way to spell ".". Am I missing
>something about what that does in Javascript?)

This popular regex

^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$

is coming from jQuery:

// A simple way to check for HTML strings
// Prioritize #id over <tag> to avoid XSS via location.hash (#9521)
// Strict HTML recognition (#11290: must start with <)
// Shortcut simple #id case for speed
rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/,

From: https://code.jquery.com/jquery-3.5.1.js

I think this is a non-POSIX hack to match any character, including newlines,
which are not included unless the "s" flag is set.

Javascript test:

"foo\nbar".match(/(.+)/)[1];
"foo"

"foo\nbar".match(/(.+)/s)[1];
"foo
bar"

"foo\nbar".match(/([\w\W]+)/)[1];
"foo
bar"

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2021-02-19 13:46:10 Re: Extensibility of the PostgreSQL wire protocol
Previous Message Markus Wanner 2021-02-19 12:36:25 [PATCH] Present all committed transaction to the output plugin