Re: Some regular-expression performance hacking

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-02-24 23:19:52
Message-ID: 4185677.1614208792@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> On my machine, the combination of these two ideas reduces the
> runtime of the example above from ~150 seconds to ~53 seconds,
> or nearly 3x better. I see something like a 2% improvement on
> Joel's test corpus, which might just be noise. So this isn't
> any sort of universal panacea, but it sure helps when LACON
> evaluation is the bottleneck.

After another round of testing, I really can't see any improvement
at all from that patch on anything except the original Tcl test
case. Indeed, a lot of cases seem very slightly worse, perhaps
because compact() now has to make two passes over all the arcs.
So that's leaving me a bit dissatisfied with it; I'm going to
stick it on the back burner for now, in hopes of a better idea.

However, in a different line of thought, I realized that the
memory allocation logic could use some polishing. It gives out
ten arcs per NFA state initially, and then adds ten more at a time.
However, that's not very bright when you look at the actual usage
patterns, because most states have only one or two out-arcs,
but some have lots and lots. I instrumented things to gather
stats about arcs-per-state on your larger corpus, and I got this,
where the seond column is the total fraction of states having
the given number of arcs or fewer:

arcs | cum_fraction
------+------------------------
0 | 0.03152871318455725868
1 | 0.55852399556959499493
2 | 0.79408539124378449284
3 | 0.86926656199366447221
4 | 0.91726891675794579062
5 | 0.92596934405572457792
6 | 0.93491612836055807037
7 | 0.94075102352639209644
8 | 0.94486598829672779379
9 | 0.94882085883928361399
10 | 0.95137992908336444821
11 | 0.95241399914559696173
12 | 0.95436547669138874594
13 | 0.95534682472329051385
14 | 0.95653340893356523452
15 | 0.95780804864876924571
16 | 0.95902387577636979702
17 | 0.95981494467267418552
18 | 0.96048662216159976997
19 | 0.96130294229052153065
20 | 0.96196856160309755204
...
3238 | 0.99999985870142624926
3242 | 0.99999987047630739515
4095 | 0.99999987342002768163
4535 | 0.99999987930746825457
4642 | 0.99999988225118854105
4706 | 0.99999989402606968694
5890 | 0.99999989696978997342
6386 | 0.99999990874467111931
7098 | 0.99999991168839140579
7751 | 0.99999994701303484347
7755 | 0.99999998233767828116
7875 | 0.99999998822511885410
8049 | 1.00000000000000000000

So it seemed clear to me that we should only give out a couple of arcs
per state initially, but then let it ramp up faster than 10 arcs per
additional malloc. After a bit of fooling I have the attached.
This does nothing for the very largest examples in the corpus (the
ones that cause "regex too complex") --- those were well over the
REG_MAX_COMPILE_SPACE limit before and they still are. But all the
rest get nicely smaller. The average pg_regcomp memory consumption
drops from ~89K to ~48K.

regards, tom lane

Attachment Content-Type Size
0007-smarter-arc-allocation.patch text/x-diff 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2021-02-25 00:58:33 RE: Refactor ECPGconnect and allow IPv6 connection
Previous Message Floris Van Nee 2021-02-24 22:29:21 RE: non-HOT update not looking at FSM for large tuple update