Re: Another regexp performance improvement: skip useless paren-captures

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Date: 2021-08-10 01:38:33
Message-ID: 1303A200-4615-4559-961A-CC906DE07EAF@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> This patch should work OK in HEAD and v14, but it will need
> a bit of fooling-about for older branches I think, given that
> they fill v->subs[] a little differently.

Note that I tested your patch *before* master, so the changes look backwards.

I tested this fix and it seems good to me. Some patterns that used to work (by chance?) now raise an error, such as:

select 'bpgouiwcquu' ~ '(((([e])*?)){0,0}?(\3))';
-ERROR: invalid regular expression: invalid backreference number
+ ?column?
+----------
+ f
+(1 row)

I ran a lot of tests with the patch, and the asserts have all cleared up, but I don't know how to think about the user facing differences. If we had a good reason for raising an error for these back-references, maybe that'd be fine, but it seems to just be an implementation detail.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-08-10 02:17:07 Re: Small documentation improvement for ALTER SUBSCRIPTION
Previous Message Mark Dilger 2021-08-10 01:24:35 Re: Another regexp performance improvement: skip useless paren-captures