pgsql: Make regexp engine's backref-related compilation state more bull

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Make regexp engine's backref-related compilation state more bull
Date: 2021-08-08 02:27:36
Message-ID: E1mCYXI-0005Se-W7@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Make regexp engine's backref-related compilation state more bulletproof.

Up to now, we remembered the definition of a capturing parenthesis
subexpression by storing a pointer to the associated subRE node.
That was okay before, because that subRE didn't get modified anymore
while parsing the rest of the regexp. However, in the wake of
commit ea1268f63, that's no longer true: the outer invocation of
parseqatom() feels free to scribble on that subRE. This seems to
work anyway, because the states we jam into the child atom in the
"prepare a general-purpose state skeleton" stanza aren't really
semantically different from the original endpoints of the child atom.
But that would be mighty easy to break, and it's definitely not how
things worked before.

Between this and the issue fixed in the prior commit, it seems best
to get rid of this dependence on subRE nodes entirely. We don't need
the whole child subRE for future backrefs, only its starting and ending
NFA states; so let's just store pointers to those.

Also, in the corner case where we make an extra subRE to handle
immediately-nested capturing parentheses, it seems like it'd be smart
to have the extra subRE have the same begin/end states as the original
child subRE does (s/s2 not lp/rp). I think that linking it from lp to
rp might actually be semantically wrong, though since Spencer's original
code did it that way, I'm not totally certain. Using s/s2 is certainly
not wrong, in any case.

Per report from Mark Dilger. Back-patch to v14 where the problematic
patches came in.

Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a

Modified Files
--------------
src/backend/regex/regcomp.c | 57 ++++++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 22 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2021-08-08 02:27:37 pgsql: Make regexp engine's backref-related compilation state more bull
Previous Message Andres Freund 2021-08-08 02:24:35 Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o