Re: Another regexp performance improvement: skip useless paren-captures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Date: 2021-08-09 19:14:45
Message-ID: 3534632.1628536485@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
> The patch looks ready to commit. I don't expect to test it any further unless you have something in particular you'd like me to focus on.

Pushed, but while re-reading it before commit I noticed that there's
some more fairly low-hanging fruit in regexp_replace(). As I had it
in that patch, it never used REG_NOSUB because of the possibility
that the replacement string uses "\N". However, we're already
pre-checking the replacement string to see if it has backslashes
at all, so while we're at it we can check for \N to discover if we
actually need any subexpression match data or not. We do need to
refactor a little to postpone calling pg_regcomp until after we
know that, but I think that makes replace_text_regexp's API less
ugly not more so.

While I was at it, I changed the search-for-backslash loops to
use memchr rather than handwritten looping. Their use of
pg_mblen was pretty unnecessary given we only need to find
backslashes, and we can assume the backend encoding is ASCII-safe.

Using a bunch of random cases generated by your little perl
script, I see maybe 10-15% speedup on test cases that don't
use \N in the replacement string, while it's about a wash
on cases that do. (If I'd been using a multibyte encoding,
maybe the memchr change would have made a difference, but
I didn't try that.)

regards, tom lane

Attachment Content-Type Size
let-regexp_replace-use-NOSUB.patch text/x-diff 8.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-08-09 19:19:55 Re: Use extended statistics to estimate (Var op Var) clauses
Previous Message Robert Haas 2021-08-09 19:00:40 using an end-of-recovery record in all cases