From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org> |
Subject: | Re: Another regexp performance improvement: skip useless paren-captures |
Date: | 2021-08-09 22:23:36 |
Message-ID: | 3591269.1628547816@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> Hmmm ... yeah, I see it too. This points up something I'd wondered
> about before, which is whether the code that "cancels everything"
> after detecting {0} is really OK. It throws away the outer subre
> *and children* without worrying about what might be inside, and
> here we see that that's not good enough --- there's still a v->subs
> pointer to the first capturing paren set, which we just deleted,
> so that the \1 later on messes up. I'm not sure why the back
> branches are managing not to crash, but that might just be a memory
> management artifact.
... yeah, it is. For me, this variant hits the assertion in all
branches:
regression=# select regexp_split_to_array('', '((.)){0}(\2){0}');
server closed the connection unexpectedly
So that's a pre-existing (and very long-standing) bug. I'm not
sure if it has any serious impact in non-assert builds though.
Failure to clean out some disconnected arcs probably has no
real effect on the regex's behavior later.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2021-08-09 22:25:49 | Re: Autovacuum on partitioned table (autoanalyze) |
Previous Message | Michael Meskes | 2021-08-09 22:21:16 | Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE |