Minor regexp bug

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Minor regexp bug
Date: 2015-11-07 02:32:48
Message-ID: 26464.1446863568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Happened across this while investigating something else ...

The regexp documentation says:

Lookahead and lookbehind constraints cannot contain <firstterm>back
references</> (see <xref linkend="posix-escape-sequences">),
and all parentheses within them are considered non-capturing.

This is true if you try a simple case, eg

regression=# select 'xyz' ~ 'x(\w)(?=\1)';
ERROR: invalid regular expression: invalid backreference number

(That's not the greatest choice of error code, perhaps, but the point
is it rejects the backref.) However, stick an extra set of parentheses
in there, and the regexp parser forgets all about the restriction:

regression=# select 'xyz' ~ 'x(\w)(?=(\1))';
?column?
----------
t
(1 row)

Since the execution machinery has no hope of executing such a backref
properly, you silently get a wrong answer. (The actual behavior in
a case like this is as though the backref had been replaced by a copy
of the bit of regexp pattern it'd referred to, ie this pattern works
like 'x(\w)(?=\w)', without any enforcement that the two \w's are
matching identical text.)

The fix is a one-liner, as per the attached patch: when recursing
to parse the innards of a parenthesized subexpression, we have to
pass down the current "type" not necessarily PLAIN, so that any
type-related restrictions still apply inside the subexpression.

What I'm wondering about is whether to back-patch this. It's possible
that people have written patterns like this and not realized that they
aren't doing quite what's expected. Getting a failure instead might not
be desirable in a minor release. On the other hand, wrong answers are
wrong answers.

Thoughts?

regards, tom lane

Attachment Content-Type Size
fix-parens-inside-lacons.patch text/x-diff 1.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-11-07 02:40:56 Re: Request: pg_cancel_backend variant that handles 'idle in transaction' sessions
Previous Message Peter Geoghegan 2015-11-07 01:25:11 Re: nodes/*funcs.c inconsistencies