From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Test harness for regex code (to allow importing Tcl's test suite) |
Date: | 2021-01-04 13:30:02 |
Message-ID: | 331c0e91-f151-4766-b526-cc2c9f84d5d7@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 4, 2021, at 04:49, Tom Lane wrote:
>Over the holiday break I've been fooling with some regex performance
>improvements.
Cool! I've also been fooling with regex performance over the years myself, not in the PostgreSQL code, but in general.
More specifically, to first DFA-minimize the regex,
and then to generate LLVMIR directly from the graph.
Perhaps some of the ideas could be interesting to look at.
Here is a live demo: https://compiler.org/reason-re-nfa/src/index.html
One idea that I came up with myself is the "merge_linear" step,
where when possible, multiple characters are read in the same operation.
Not sure if other regex JIT engines does this, but it makes quite a difference
for large regexes where you have long strings.
Note: There is no support for capture groups, back-references, etc, but | + * () [] [^] works.
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2021-01-04 13:32:43 | Re: Parallel Inserts in CREATE TABLE AS |
Previous Message | Amine Tengilimoglu | 2021-01-04 13:12:34 | Re: pg_rewind restore_command issue in PG12 |