Re: Some regular-expression performance hacking

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Some regular-expression performance hacking
Date: 2021-02-13 17:35:45
Message-ID: 1662033.1613237745@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> In total, I scraped the first-page of some ~50k websites,
> which produced 45M test rows to import,
> which when GROUP BY pattern and flags was reduced
> down to 235k different regex patterns,
> and 1.5M different text string subjects.

This seems like an incredibly useful test dataset.
I'd definitely like a copy.

> No is_match differences were detected, good!

Cool ...

> However, there were 23 cases where what got captured differed:

I shall take a closer look at that.

Many thanks for doing this work!

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Patrick Handja 2021-02-13 19:54:39 How to get Relation tuples in C function
Previous Message Joel Jacobson 2021-02-13 17:19:34 Re: Some regular-expression performance hacking