Re: Some regular-expression performance hacking

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-02-14 16:45:40
Message-ID: 1929732.1613321140@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> I've successfully tested both patches against the 1.5M regexes-in-the-wild dataset.
> Out of the 1489489 (pattern, text string) pairs tested,
> there was only one single deviation:
> This 100577 bytes big regex (pattern_id = 207811)...
> ...
> ...previously raised...
> error invalid regular expression: regular expression is too complex
> ...but now goes through:

> Nice. The patched regex engine is apparently capable of handling even more complex regexes than before.

Yeah. There are various limitations that can lead to REG_ETOOBIG, but the
main ones are "too many states" and "too many arcs". The RAINBOW change
directly reduces the number of arcs and thus makes larger regexes feasible.
I'm sure it's coincidental that the one such example you captured happens
to be fixed by this change, but hey I'll take it.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2021-02-14 17:35:48 Re: Extensibility of the PostgreSQL wire protocol
Previous Message Ranier Vilela 2021-02-14 14:39:47 Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)