Improved regular expression error message for backrefs

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Improved regular expression error message for backrefs
Date: 2021-08-23 00:26:40
Message-ID: E77ABEF5-8CB5-4777-A654-1B1FA32D620E@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Please find attached an improvement to the error messages given for invalid backreference usage:

select 'xyz' ~ '(.)(.)\3';
ERROR: invalid regular expression: invalid backreference number
select 'xyz' ~ '(.)(.)(?=\2)';
-ERROR: invalid regular expression: invalid backreference number
+ERROR: invalid regular expression: backreference in lookaround assertion

The first regexp is invalid because only two capture groups exist, so \3 doesn't refer to anything. The second regexp is rejected because the regular expression system does not support backreferences within lookaround assertions. (See the docs, section 9.7.3.6. Limits And Compatibility.) It is flat wrong to say the backreference number is invalid. There is a perfectly valid capture that \2 refers to.

The patch defines a new error code REG_ENOBREF in regex/regex.h right next to REG_ESUBREG from which it is split out, rather than at the end of the list. Is there a project preference to add it at the end? Certainly, that would give a shorter git diff.

Are there dependencies on the current error messages which prevent such changes?

Attachment Content-Type Size
v1-0001-Distinguishing-regular-expression-backref-errors.patch application/octet-stream 6.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-08-23 01:46:42 Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Previous Message Noah Misch 2021-08-22 22:59:44 Re: replay of CREATE TABLESPACE eats data at wal_level=minimal