From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Joel Jacobson" <joel(at)compiler(dot)org> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Regex performance regression induced by match-all code |
Date: | 2021-05-02 16:53:16 |
Message-ID: | 3709847.1619974396@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Sat, May 1, 2021, at 21:46, Tom Lane wrote:
>> I found a nasty performance problem in commit 824bf7190: given the
>> right sort of regex, checkmatchall() takes an unreasonable amount
>> of time.
> Nice catch.
> I've successfully tested the patch on the regex corpus:
Thanks for testing!
>> The main thing I find a bit ugly about this solution is that
>> I'm abusing the state->tmp fields (which are declared struct state *)
>> to hold the memoization results (which are "bool *"). It'd be
>> possible to avoid that by allocating an additional "bool **" array
>> indexed by state number, but whether it's worth that depends on how
>> allergic you are to weird pointer casts.
I tried rewriting it like that, and I have to say I do like it better
that way. I think it's clearer, which seems well worth one extra
malloc.
Also, I poked a little more at the question of the heuristic limit
on number of states, by checking the actual numbers of states in
various ways of writing matchall regexes. It looks to me like
we can cut the limit to DUPINF*2 and still have lots of daylight,
because reasonable (and even not so reasonable) ways to write a
pattern that can match up to K characters seem to come out with
not much more than K states.
Hence, PFA v2. I also added a couple of test cases based on
looking at code coverage in this area, as well as a case that
takes an unreasonably long time without this fix.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-exponential-cost-of-checkmatchall-2.patch | text/x-diff | 21.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2021-05-02 17:45:18 | Re: websearch_to_tsquery() returns queries that don't match to_tsvector() |
Previous Message | vignesh C | 2021-05-02 16:34:27 | Re: Identify missing publications from publisher while create/alter subscription. |