From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org> |
Subject: | Re: Another regexp performance improvement: skip useless paren-captures |
Date: | 2021-08-09 23:31:41 |
Message-ID: | 3643732.1628551901@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
> +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]');
> +server closed the connection unexpectedly
Here's a quick draft patch for this. Basically it moves the
responsibility for clearing v->subs[] pointers into the freesubre()
recursion, so that it will happen for contained capturing parens
not only the top level.
There is a potentially interesting definitional question:
what exactly ought this regexp do?
((.)){0}\2
Because the capturing paren sets are zero-quantified, they will
never be matched to any characters, so the backref can never
have any defined referent. I suspect that study of the POSIX
spec would lead to the conclusion that this is a legal regexp
but it will never match anything. Implementing that would be
tedious though, and what's more it seems very unlikely that
the user wanted any such behavior. So I think throwing an
error is an appropriate response. The existing code will
throw such an error for
((.)){0}\1
so I guess Spencer did think about this to some extent -- he
just forgot about the possibility of nested parens.
This patch should work OK in HEAD and v14, but it will need
a bit of fooling-about for older branches I think, given that
they fill v->subs[] a little differently.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-zero-quantified-nested-parens.patch | text/x-diff | 3.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2021-08-09 23:37:18 | Re: Another regexp performance improvement: skip useless paren-captures |
Previous Message | Peter Geoghegan | 2021-08-09 23:20:05 | Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE |