Re: Some regular-expression performance hacking

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-02-13 21:11:37
Message-ID: 1813969.1613250697@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> No is_match differences were detected, good!
> However, there were 23 cases where what got captured differed:

These all stem from the same oversight: checkmatchall() was being
too cavalier by ignoring "pseudocolor" arcs, which are arcs that
match start-of-string or end-of-string markers. I'd supposed that
pseudocolor arcs necessarily match parallel RAINBOW arcs, because
they start out that way (cf. newnfa). But it turns out that
some edge-of-string constraints can be optimized in such a way that
they only appear in the final NFA in the guise of missing or extra
pseudocolor arcs. We have to actually check that the pseudocolor arcs
match the RAINBOW arcs, otherwise our "matchall" NFA isn't one because
it acts differently at the start or end of the string than it does
elsewhere.

So here's a revised pair of patches (0001 is actually the same as
before).

Thanks again for testing!

regards, tom lane

Attachment Content-Type Size
0001-invent-rainbow-arcs-2.patch text/x-diff 19.6 KB
0002-recognize-matchall-NFAs-2.patch text/x-diff 21.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-02-13 21:12:25 Re: Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)
Previous Message Ranier Vilela 2021-02-13 20:59:32 Re: Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)