From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org> |
Subject: | Re: Another regexp performance improvement: skip useless paren-captures |
Date: | 2021-08-10 00:14:29 |
Message-ID: | 80944B12-6B9A-443F-B4F8-95B04F85E28A@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> There is a potentially interesting definitional question:
> what exactly ought this regexp do?
>
> ((.)){0}\2
>
> Because the capturing paren sets are zero-quantified, they will
> never be matched to any characters, so the backref can never
> have any defined referent.
Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we should be clear what that is:
#!/usr/bin/perl
use strict;
use warnings;
our $match;
if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
{
print "captured 1 $1\n" if defined $1;
print "captured 2 $2\n" if defined $2;
print "captured 3 $3\n" if defined $3;
print "captured 4 $4\n" if defined $4;
print "match = $match\n" if defined $match;
}
This will print "captured 3 fo", proving that although the regular expression is parsed with the (..) bound to the third capture group, the first two capture groups never run. If you don't believe that, change the {0} to {1} and observe that the script dies.
> So I think throwing an
> error is an appropriate response. The existing code will
> throw such an error for
>
> ((.)){0}\1
>
> so I guess Spencer did think about this to some extent -- he
> just forgot about the possibility of nested parens.
Ugg. That means our code throws an error where perl does not, pretty well negating my point above. If we're already throwing an error for this type of thing, I agree we should be consistent about it. My personal preference would have been to do the same thing as perl, but it seems that ship has already sailed.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2021-08-10 00:18:27 | Re: Another regexp performance improvement: skip useless paren-captures |
Previous Message | Alvaro Herrera | 2021-08-10 00:10:29 | Re: Autovacuum on partitioned table (autoanalyze) |