From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org> |
Subject: | Re: Another regexp performance improvement: skip useless paren-captures |
Date: | 2021-08-08 18:22:24 |
Message-ID: | BDB634FD-2EE5-4697-91A0-1F53E1363D3B@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Aug 8, 2021, at 10:04 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I've also rebased over the bug fixes from the other thread,
> and added a couple more test cases.
>
> regards, tom lane
Hmm. This changes the behavior when applied against master (c1132aae336c41cf9d316222e525d8d593c2b5d2):
select regexp_split_to_array('uuuzkodphfbfbfb', '((.))(\1\2)', 'ntw');
regexp_split_to_array
-----------------------
- {"",zkodphfbfbfb}
+ {uuuzkodphfbfbfb}
(1 row)
The string starts with three "u" characters. The first of them is doubly-matched, meaning \1 and \2 refer to the first "u" character. The (\1\2) that follows matches the next two "u" characters. When the extra "useless" capture group is skipped, apparently this doesn't work anymore. I haven't looked at your patch, so I'm not sure why, but I'm guessing that \2 doesn't refer to anything.
That analysis is consistent with the next change:
select regexp_split_to_array('snfwbvxeesnzqabixqbixqiumpgxdemmxvnsemjxgqoqknrqessmcqmfslfspskqpqxe', '((((?:.))))\3');
- regexp_split_to_array
----------------------------------------------------------------------
- {snfwbvx,snzqabixqbixqiumpgxde,xvnsemjxgqoqknrqe,mcqmfslfspskqpqxe}
+ regexp_split_to_array
+------------------------------------------------------------------------
+ {snfwbvxeesnzqabixqbixqiumpgxdemmxvnsemjxgqoqknrqessmcqmfslfspskqpqxe}
(1 row)
The pattern matches any double character. I would expect it to match the "ee", the "mm" and the "ss" in the text. With the patched code, it matches nothing.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Dagfinn Ilmari Mannsåker | 2021-08-08 18:31:40 | Re: [PATCH] Add tab-complete for backslash commands |
Previous Message | Mark Dilger | 2021-08-08 18:02:03 | Re: Assert triggered during RE_compile_and_cache |