Re: Some regular-expression performance hacking

From: Noah Misch <noah(at)leadboat(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-03-06 18:09:25
Message-ID: 20210306180925.GA2345664@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 13, 2021 at 06:19:34PM +0100, Joel Jacobson wrote:
> To test the correctness of the patches,
> I thought it would be nice with some real-life regexes,
> and just as important, some real-life text strings,
> to which the real-life regexes are applied to.
>
> I therefore patched Chromium's v8 regexes engine,
> to log the actual regexes that get compiled when
> visiting websites, and also the text strings that
> are the regexes are applied to during run-time
> when the regexes are executed.
>
> I logged the regex and text strings as base64 encoded
> strings to STDOUT, to make it easy to grep out the data,
> so it could be imported into PostgreSQL for analytics.
>
> In total, I scraped the first-page of some ~50k websites,
> which produced 45M test rows to import,
> which when GROUP BY pattern and flags was reduced
> down to 235k different regex patterns,
> and 1.5M different text string subjects.

It's great to see this kind of testing. Thanks for doing it.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-06 18:19:44 Re: [PATCH] pgbench: Bug fix for the -d option
Previous Message Tom Lane 2021-03-06 18:09:10 Re: Feedback on table expansion hook (including patch)