Re: Regular expression character escape

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Ronan Dunklau'" <rdunklau(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Regular expression character escape
Date: 2012-02-24 15:38:34
Message-ID: 009401ccf30a$5eb737f0$1c25a7d0$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Ronan Dunklau
Sent: Friday, February 24, 2012 6:34 AM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Regular expression character escape

Hello.

I'd like to perform a query using user-submitted input in a regular expression.

Something along the lines of:

select some_col
from some_table
where some_col ~ ('^' || user_submitted_input || '\d*$')

This query is looking for every value matching the user submitted input with optional trailing decimal characters.

My problem is: the user can break this using special regular expression characters (ex: 'test(').

I'm looking for a way to escape regular expression characters from the input (maybe a function already exists for that purpose?).

I could do it on the client side, but I'm not really sure how python and postgresql flavors of regular expressions differ.

I've seen that the (?q) modifier is supported (treat the rest of the regexp as normal characters), but unless I missed something it I can't use it anywhere else than at the beginning of the regexp.

From the docs at
http://www.postgresql.org/docs/current/static/functions-matching.html:

"Embedded options take effect at the ) terminating the sequence. They can appear only at the start of an ARE."

Any idea on how to achieve this ?

Thank you.

--
Ronan Dunklau

------------------------------------------------------------------------------------------------

How about:

WHERE some_col LIKE (user_submitted_input || '%') AND some_col ~ ('^.{' || length_of_user_submitted_input || '}\d*$')

I'd have some reservations regarding multi-byte characters however - but this avoids any escaping of the input string.

You could (should?) write the escaping routine on the server side in a user-defined function:

WHERE some_col ~ ('^' || make_regexp_literal(user_submitted_stringliteral) || '\d*$')

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Durumdara 2012-02-24 15:39:39 Stability in Windows?
Previous Message Adrian Klaver 2012-02-24 15:33:01 Re: Having a problem with RoR-3.1.1 and Pg-0.1