|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Joel Jacobson <joel(at)compiler(dot)org>|
|Cc:||Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org|
|Subject:||Re: Some regular-expression performance hacking|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> However, in a different line of thought, I realized that the
> memory allocation logic could use some polishing. It gives out
> ten arcs per NFA state initially, and then adds ten more at a time.
> However, that's not very bright when you look at the actual usage
> patterns, because most states have only one or two out-arcs,
> but some have lots and lots.
Hold the phone ... after a bit I started to wonder why Spencer made
arc allocation be per-state at all, rather than using one big pool
of arcs. Maybe there's some locality-of-reference argument to be
made for that, but I doubt he was worrying about that back in the
90s. Besides, the regex compiler spends a lot of time iterating
over in-chains and color-chains, not just out-chains; it's hard
to see why trying to privilege the latter case would help much.
What I suspect, based on this old comment in regguts.h:
* Having a "from" pointer within each arc may seem redundant, but it
* saves a lot of hassle.
is that Henry did it like this initially to save having a "from"
pointer in each arc, and never re-thought the allocation mechanism
after he gave up on that idea.
So I rearranged things to allocate arcs out of a common pool, and for
good measure made the state allocation code do the same thing. I was
pretty much blown away by the results: not only is the average-case
space usage about half what it is on HEAD, but the worst-case drops
by well more than a factor of ten. I'd previously found, by raising
REG_MAX_COMPILE_SPACE, that the regexes in the second corpus that
trigger "regex too complex" errors all need 300 to 360 MB to compile
with our HEAD code. With the new patch attached, they compile
successfully in a dozen or so MB. (Yesterday's patch really did
nothing at all for these worst-case regexes, BTW.)
I also see about a 10% speedup overall, which I'm pretty sure is
down to needing fewer interactions with malloc() (this is partially
a function of having batched the state allocations, of course).
So even if there is a locality-of-reference loss, it's swamped by
fewer mallocs and less total space used.
regards, tom lane
|Next Message||Masahiko Sawada||2021-02-26 00:46:13||Re: Single transaction in the tablesync worker?|
|Previous Message||Álvaro Herrera||2021-02-26 00:04:00||Re: libpq debug log|