Re: Another regexp performance improvement: skip useless paren-captures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Date: 2021-08-09 21:26:16
Message-ID: 3581981.1628544376@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
> I can still trigger the old bug for which we thought we'd pushed a fix. The test case below crashes on master (e12694523e7e4482a052236f12d3d8b58be9a22c), and also on the fixed version "Make regexp engine's backref-related compilation state more bulletproof." (cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a).

> Can you test if it crashes for you, too? I'm not sure I see why this one fails when millions of others pass.

> The backtrace is still complaining about regc_nfa.c:1265:

> +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]');
> +server closed the connection unexpectedly

Hmmm ... yeah, I see it too. This points up something I'd wondered
about before, which is whether the code that "cancels everything"
after detecting {0} is really OK. It throws away the outer subre
*and children* without worrying about what might be inside, and
here we see that that's not good enough --- there's still a v->subs
pointer to the first capturing paren set, which we just deleted,
so that the \1 later on messes up. I'm not sure why the back
branches are managing not to crash, but that might just be a memory
management artifact.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2021-08-09 21:27:18 Re: Asynchronous and "direct" IO support for PostgreSQL.
Previous Message Peter Geoghegan 2021-08-09 21:05:32 Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE