Regexp: observable bug from careless usage of zaptreesubs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Regexp: observable bug from careless usage of zaptreesubs
Date: 2021-08-23 15:43:07
Message-ID: 151435.1629733387@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While looking at the regexp code, I started to get uncomfortable
about the implications of commit 0c3405cf1: it means that not
only the cdissect() phase, but also the preceding DFA-check phase
(longest()/shortest()) rely on saved subexpression match positions
to be valid for the match we're currently considering. It seemed
to me that the cdissect() recursion wasn't being careful to reset
the match pointers for an abandoned submatch before moving on to
the next attempt, meaning that dfa_backref() could conceivably get
applied using a stale match pointer.

Upon poking into it, I failed to find any bug of that exact ilk,
but what I did find was a pre-existing bug of not resetting an
abandoned match pointer at all. That allows these fun things:

regression=# select 'abcdef' ~ '^(.)\1|\1.';
?column?
----------
t
(1 row)

regression=# select 'abadef' ~ '^((.)\2|..)\2';
?column?
----------
t
(1 row)

In both of these examples, the (.) capture is in an alternation
branch that subsequently fails; therefore, the later backref
should never be able to match. But it does, because we forgot
to clear the capture's match data on the way out.

It turns out that this can be fixed using fewer, not more, zaptreesubs
calls, if we are careful to define the recursion rules precisely.
See attached.

This bug is ancient. I verified it as far back as PG 7.4, and
it can also be reproduced in Tcl, so it's likely aboriginal to
Spencer's library. It's not that surprising that no one has
reported it, because regexps that have this sort of possibly-invalid
backref are most likely incorrect. In most use-cases the existing
code will fail to match, as expected, so users would probably notice
that and fix their regexps without observing that there are cases
where the match incorrectly succeeds.

regards, tom lane

Attachment Content-Type Size
zap-when-zapping-is-needed.patch text/x-diff 7.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message alvherre@alvh.no-ip.org 2021-08-23 15:49:27 Re: archive status ".ready" files may be created too early
Previous Message Alvaro Herrera 2021-08-23 15:40:20 Re: Mark all GUC variable as PGDLLIMPORT