Regular expression query

From: Rodger Donaldson <rodgerd(at)diaspora(dot)gen(dot)nz>
To: pgsql-sql(at)postgresql(dot)org
Subject: Regular expression query
Date: 2000-08-25 03:13:34
Message-ID: 20000825151333.A26833@diaspora.gen.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


I have a large group of records which were entered with trailing garbage, in
the form of superfluous \n.

The main problem this has caused, other than the obvious one, is that the
record in question is being used as a primary key, so some duplicates have
slipped through. I assumed a simple statement like:

SELECT url
FROM sites
WHERE url ~ url || '\\s+'

...would allow me to find all the duplicate-but-not-quite records.

While this concatenation works with the LIKE directive (ie LIKE url || '%'),
postgresql barfs on it in a regexp with the error:

ERROR: Unable to identify an operator '||' for types 'bool' and 'unknown'
You will have to retype this query using an explicit cast

Encapsulating the concatenation in brackets leads the query parser to stop
bleating. I presume this is simply a limitation in the parser's ability to
make inferences about regexps vs. LIKEs.

The other aspect of this is that it seems that postgresql's regexp engine
doesn't understand some expected regexps; I've tried both escaped and
unescaped versions of, eg \w, \s, \n and so on a pg seems to ignore them.
Am I exceeding the capabilities of the regexp parser?

--
Rodger Donaldson rodgerd(at)diaspora(dot)gen(dot)nz
I just had this vision of a young boy cowering in terror, whispering:
"I see dumb people"
-- Steve VanDevender

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Yury Don 2000-08-25 03:27:32 Re: weird structure
Previous Message Tatsuo Ishii 2000-08-25 01:07:27 Re: sorting in UNICODE table