Re: Future of our regular expression code

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Future of our regular expression code
Date: 2012-02-20 05:04:14
Message-ID: 4F41D44E.8080601@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/19/2012 10:28 PM, Greg Stark wrote:
> One thing that concerns me more and more is that most sufficiently
> powerful regex implementations are susceptible to DOS attacks.

There's a list of "evil regexes" at http://en.wikipedia.org/wiki/ReDoS

The Perl community's reaction to Russ Cox's regex papers has some
interesting comments along these lines too:
http://www.perlmonks.org/?node_id=597262

That brings up the backreferences concerns Tom already mentioned.
Someone also points out the Thompson NFA that Cox advocates in his first
article can use an excessive amount of memory when processing Unicode:
http://www.perlmonks.org/?node_id=597312

Aside--Cox's "Regular Expression Matching with a Trigram Index" is an
interesting intro to trigram use for FTS purposes, and might have some
inspirational ideas for further progress in that area:
http://swtch.com/~rsc/regexp/regexp4.html

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Baccus 2012-02-20 05:37:11 Re: leakproof
Previous Message Greg Smith 2012-02-20 04:26:10 Re: wal_buffers