Re: Another regexp performance improvement: skip useless paren-captures

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Date: 2021-08-10 00:14:29
Message-ID: 80944B12-6B9A-443F-B4F8-95B04F85E28A@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> There is a potentially interesting definitional question:
> what exactly ought this regexp do?
>
> ((.)){0}\2
>
> Because the capturing paren sets are zero-quantified, they will
> never be matched to any characters, so the backref can never
> have any defined referent.

Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we should be clear what that is:

#!/usr/bin/perl

use strict;
use warnings;

our $match;
if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
{
print "captured 1 $1\n" if defined $1;
print "captured 2 $2\n" if defined $2;
print "captured 3 $3\n" if defined $3;
print "captured 4 $4\n" if defined $4;
print "match = $match\n" if defined $match;
}

This will print "captured 3 fo", proving that although the regular expression is parsed with the (..) bound to the third capture group, the first two capture groups never run. If you don't believe that, change the {0} to {1} and observe that the script dies.

> So I think throwing an
> error is an appropriate response. The existing code will
> throw such an error for
>
> ((.)){0}\1
>
> so I guess Spencer did think about this to some extent -- he
> just forgot about the possibility of nested parens.

Ugg. That means our code throws an error where perl does not, pretty well negating my point above. If we're already throwing an error for this type of thing, I agree we should be consistent about it. My personal preference would have been to do the same thing as perl, but it seems that ship has already sailed.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-08-10 00:18:27 Re: Another regexp performance improvement: skip useless paren-captures
Previous Message Alvaro Herrera 2021-08-10 00:10:29 Re: Autovacuum on partitioned table (autoanalyze)