Re: Another regexp performance improvement: skip useless paren-captures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Date: 2021-08-09 22:23:36
Message-ID: 3591269.1628547816@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Hmmm ... yeah, I see it too. This points up something I'd wondered
> about before, which is whether the code that "cancels everything"
> after detecting {0} is really OK. It throws away the outer subre
> *and children* without worrying about what might be inside, and
> here we see that that's not good enough --- there's still a v->subs
> pointer to the first capturing paren set, which we just deleted,
> so that the \1 later on messes up. I'm not sure why the back
> branches are managing not to crash, but that might just be a memory
> management artifact.

... yeah, it is. For me, this variant hits the assertion in all
branches:

regression=# select regexp_split_to_array('', '((.)){0}(\2){0}');
server closed the connection unexpectedly

So that's a pre-existing (and very long-standing) bug. I'm not
sure if it has any serious impact in non-assert builds though.
Failure to clean out some disconnected arcs probably has no
real effect on the regex's behavior later.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-08-09 22:25:49 Re: Autovacuum on partitioned table (autoanalyze)
Previous Message Michael Meskes 2021-08-09 22:21:16 Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE