Re: Notes about fixing regexes and UTF-8 (yet again)

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: NISHIYAMA Tomoaki <tomoakin(at)staff(dot)kanazawa-u(dot)ac(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Notes about fixing regexes and UTF-8 (yet again)
Date: 2012-02-18 23:01:37
Message-ID: m2hayng2zy.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Yeah, it's conceivable that we could implement something whereby
> characters with codes above some cutoff point are handled via runtime
> calls to iswalpha() and friends, rather than being included in the
> statically-constructed DFA maps. The cutoff point could likely be a lot
> less than U+FFFF, too, thereby saving storage and map build time all
> round.

It's been proposed to build a “regexp” type in PostgreSQL which would
store the DFA directly and provides some way to run that DFA out of its
“storage” without recompiling.

Would such a mechanism be useful here? Would it be useful only when
storing the regexp in a column somewhere then applying it in the query
from there (so most probably adding a join or subquery somewhere)?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-02-18 23:03:06 Re: MySQL search query is not executing in Postgres DB
Previous Message Dimitri Fontaine 2012-02-18 22:55:43 Re: MySQL search query is not executing in Postgres DB