Re: Regex back-reference semantics and performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Regex back-reference semantics and performance
Date: 2021-03-01 20:22:08
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Mon, Mar 1, 2021, at 01:53, Tom Lane wrote:
>> 0001-fix-backref-semantics.patch
>> 0002-backref-performance-hack.patch

> I've successfully tested both patches.

Again, thanks for testing!

> On HEAD the trouble-query took forever, I cancelled it after 23 minutes.

Yeah, I have not had the patience to run it to completion either.

> No notable timing differences:

I'm seeing a win of maybe 1% across the entire corpus, which isn't
much but it's something. It's not too surprising that this backref
issue is seldom hit, or we'd have had more complaints about it.

BTW, I had what seemed like a great idea to improve the situation in
the left-deep-tree case I talked about: we could remember the places
where we'd had to use the NFA to check a backreference subre, and
then at the point where we capture the reference string, recheck any
previous approximate answers, and fail the capture node's match when
any previous backref doesn't match. Turns out this only mostly works.
In corner cases where the match is ambiguous, it can change the
results from what we got before, which I don't think is acceptable.
Basically, that allows the backref node to have action-at-a-distance
effects on where the earlier concat node divides a substring, which
changes the behavior.

So it seems this is about the best we can do for now. I'll wait
a little longer to see if anyone complains about the proposed
semantics change, though.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2021-03-01 20:23:23 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Previous Message Mark Dilger 2021-03-01 20:20:31 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.