From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | pgsql: Improve performance of regular expression back-references. |
Date: | 2021-03-02 17:14:26 |
Message-ID: | E1lH8bK-0006xb-3z@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Improve performance of regular expression back-references.
In some cases, at the time that we're doing an NFA-based precheck
of whether a backref subexpression can match at a particular place
in the string, we already know which substring the referenced
subexpression matched. If so, we might as well forget about the NFA
and just compare the substring; this is faster and it gives an exact
rather than approximate answer.
In general, this optimization can help while we are prechecking within
the second child expression of a concat node, while the capture was
within the first child expression; then the substring was saved during
cdissect() of the first child and will be available to NFA checks done
while cdissect() recurses into the second child. It can help quite a
lot if the tree looks like
concat
/ \
capture concat
/ \
expensive stuff backref
as we will be able to avoid recursively dissecting the "expensive
stuff" before discovering that the backref isn't satisfied with a
particular midpoint that the lower concat node is testing. This
doesn't help if the concat tree is left-deep, as the capture node
won't get set soon enough (and it's hard to fix that without changing
the engine's match behavior). Fortunately, right-deep concat trees
are the common case.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/0c3405cf11a12da1a4278c6833f4d979fe06c866
Modified Files
--------------
src/backend/regex/rege_dfa.c | 117 +++++++++++++++++++++++++++++++++++++++++++
src/backend/regex/regexec.c | 24 +++++++--
2 files changed, 138 insertions(+), 3 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-03-02 18:41:00 | pgsql: Use native path separators to pg_ctl in initdb |
Previous Message | Michael Paquier | 2021-03-02 04:21:09 | pgsql: Fix duplicated test case in TAP tests of reindexdb |