From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Some regular-expression performance hacking |
Date: | 2021-02-19 12:45:34 |
Message-ID: | b16bccc0-3f98-47ad-81aa-699a3e00630d@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 18, 2021, at 19:53, Tom Lane wrote:
>(Having said that, I can't help noticing that a very large fraction
>of those usages look like, eg, "[\w\W]". It seems to me that that's
>a very expensive and unwieldy way to spell ".". Am I missing
>something about what that does in Javascript?)
This popular regex
^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$
is coming from jQuery:
// A simple way to check for HTML strings
// Prioritize #id over <tag> to avoid XSS via location.hash (#9521)
// Strict HTML recognition (#11290: must start with <)
// Shortcut simple #id case for speed
rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/,
From: https://code.jquery.com/jquery-3.5.1.js
I think this is a non-POSIX hack to match any character, including newlines,
which are not included unless the "s" flag is set.
Javascript test:
"foo\nbar".match(/(.+)/)[1];
"foo"
"foo\nbar".match(/(.+)/s)[1];
"foo
bar"
"foo\nbar".match(/([\w\W]+)/)[1];
"foo
bar"
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2021-02-19 13:46:10 | Re: Extensibility of the PostgreSQL wire protocol |
Previous Message | Markus Wanner | 2021-02-19 12:36:25 | [PATCH] Present all committed transaction to the output plugin |