Re: Another regexp performance improvement: skip useless paren-captures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Date: 2021-08-08 17:04:38
Message-ID: 3173384.1628442278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
> The patch triggers an assertion that master does not:

> +select 'azrlfkjbjgidgryryiglcabkgqluflu' !~ '(.(.)((.)))((?:(\3)))';

On looking into this, it's pretty simple: regexec.c has an assertion
that a pure-capture subre node ought to be doing some capturing.

case '(': /* no-op capture node */
assert(t->child != NULL);
assert(t->capno > 0);

That's fine as of HEAD, but with the proposed patch, we may notice
that the node isn't actually referenced by any backref, and remove
its capture marker, allowing this assertion to fire. Nothing's
really wrong though.

There seem to be three things we could do about that:

1. Extend removecaptures() so that it can actually remove no-op
capture nodes if it's removed their capture markings. This would
substantially complicate that function, and I judge that it's not
worth the trouble. We'll only have such nodes in cases of
capturing parentheses immediately surrounding capturing parentheses,
which doesn't seem like a case worth expending sweat for.

2. Just drop the "t->capno > 0" assertion in regexec.c.

3. Weaken said assertion, perhaps by also checking the BRUSE flag bit.

Not sure that I see any point to #3, so I just dropped the
assertion in the attached.

I've also rebased over the bug fixes from the other thread,
and added a couple more test cases.

regards, tom lane

Attachment Content-Type Size
optimize-useless-captures-2.patch text/x-diff 14.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-08-08 17:15:28 Re: Assert triggered during RE_compile_and_cache
Previous Message Mark Dilger 2021-08-08 17:00:10 Re: Assert triggered during RE_compile_and_cache