Re: Regular Expression INTRO [long] [OT]

From: will trillich <will(at)serensoft(dot)com>
To: pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Regular Expression INTRO [long] [OT]
Date: 2002-04-10 21:38:54
Message-ID: 20020410163854.A5944@serensoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Apr 09, 2002 at 06:05:43PM -0600, will trillich wrote:
[snip]
> With LIKE, your search values can use the percent (%)
> to tell PostgreSQL that anything can occupy that spot
> -- one character, twelve symbols or zero digits -- and
> still satisfy the search.
>
> On Unix or Linux, this is basically the same as the
> ASTERISK (*) at the command line, when dealing with
> file names:
>
> # list all files whose names begin with '.bash'
> ls .bash*
>
> # list all files containing 'out' anywhere in the name
> ls *out*
>
> # list all file names ending with '.pl'
> ls *.pl
>
> # list file starting with 'proj', ending with '.c'
> ls proj*.c

i'm torn between leaving in a full set of shell-glob samples to
pair up with postgres-like samples, and using only one (as
recommended by Holger Klawitter -- thanks!)...

> With PostgreSQL using the LIKE operator, use the
> percent, instead:
>
> -- list all customers within the 47610 postal code
> SELECT * FROM cust WHERE zip LIKE '47610%';
>
> -- display customers who have 'Corp' in their names
> SELECT * FROM cust WHERE name LIKE '%Corp%';
>
> -- show customers whose names end in 'LLC'
> SELECT * FROM cust WHERE name LIKE '%LLC';
>
> -- documents beginning with 'We', ending with 'rica'
> SELECT * FROM doc WHERE contents LIKE 'We%rica';
>
> Wherever the '%' appears (using the LIKE operator)
> Postgres allows anything at all to match -- from a
> lengthy string of text, to one single character, to
> a zero-length string -- i.e. nothing at all.
>
> ...ILIKE 'A%Z'
> -- matches 'AZ' and 'A to Z' and 'ABC2468XYZ' and 'A(at)$*Z'

whoops -- this shoulda been LIKE, not ILIKE (at least not yet)

[snip]

> So what's with the BACKSLASH in front of the dot? Well,
> just as LIKE has the UNDERSCORE (_) to denote "any single
> character", REGULAR EXPRESSIONS use the DOT (.) for
> that very same purpose. So we have to "escape" the
> dot to alter its normal interpretation, using the.

using the what? um, "...using the backslash." oops!

[snip]

> And for very common cases, there are handy abbreviations:
>
> 'x{1,1}' 'x' one (the default)
> 'r{0,1}' 'r?' zero or one, i.e. 'x' is optional

should be "'r' is optional". (thanks, Holger)

> 'B{0,}' 'B*' zero or more
> 'z{1,}' 'z+' one or more

may as well flesh these out, to be parallel with the others:

'B{0,}' 'B*' zero or more B's
'z{1,}' 'z+' one or more z's

[snip]

> Note that we've prefixed the previous pattern with
>
> '[^0-9]'
>
> because, within the square brackets of a character class,
> the CARAT (^) means "anything EXCEPT..."
>
> So now, 00000-0000 will NOT match; this is what we're
> after.

maybe i could be more clear, by saying:

So now, our pattern will NOT match something like
00000-0000, and since we don't want it to, we're making
progress.

[snip]

> And, just as in the LIKE operator, unless regular expressions
> ARE anchored at the beginning of a field, you'll defeat any
> index you have for that field. Indexes help alphabetize by
> comparing the beginning of your fields to each other; unless
> you're looking at the beginning of the field, your index can't
> be used.

some of you asked if regexes are actually able to use the index.
well, here's the answer: if they're anchored at the front, YES!

EXPLAIN
repo=# explain
repo-# select * from _table_1015197075 where str ~ '^f';
NOTICE: QUERY PLAN:

Index Scan using _table_ix on _table_1015197075 (cost=0.00..2.01 rows=1 width=24)

EXPLAIN
repo=# explain
repo-# select * from _table_1015197075 where str ~ 'f';
NOTICE: QUERY PLAN:

Seq Scan on _table_1015197075 (cost=100000000.00..100000001.03 rows=1 width=24)

to be honest, i "SET ENABLE_SEQSCAN = false" and created a
quickie table with about fifty rows, with an index on the "str"
column.

with '^f' it DID use the index; without the carat, 'f', it
absolutely COULD NOT use the index... imagine looking for words
in the dictionary that /contain/ the letter 'f', right? (no
mystery here -- but whether the '^f' COULD use the index was the
question, of course -- and it did).

cool. (i suspected, but wasn't certain.)

[snip]

> There are ways to anchor your searches to word boundaries,
> as well -- not just beginning-of-field and end-of-field. See
> your documentation for details.

rewording is in order here -- how about:

There are ways to anchor your searches to word boundaries,
as well -- you're not restricted to testing only for
beginning-of-field or end-of-field. (In fact, that's a more
likely solution to the problem than what I devised here.)
See your documentation for information on [[:<:]] and
[[:>:]].

[snip]

> As for [1] email stuff, it can be (zero or more of):
>
> '[\\-\\.]'
>
> hyphen or dot, followed by
>
> '[a-z0-9_]'
>
> alphanumerics (or underscore)
>
> '+'
>
> one or more times.

how about a little added explanation for clarity?

This means an address (before the @domain.com) can be
alphanumerics only, or if it contains a dot or hyphen that
more alphanumerics must follow the dot or hyphen. So 'me'
and 'my.self' and 'albert.einstein-newton-john.jr' are all
acceptable to this part of the pattern.

> And as for [2] site stuff, it can be (one or more of):
>
> '[\\-\\.]'
>
> hyphen or dot, followed by
>
> '[a-z0-9_]'
>
> alphanumerics (or underscore)
>
> '+'
>
> one or more times.

more clarity -- something like:

So the site stuff means that the domain (@some.where) MUST
contain at least one dot or hyphen, each of which must be
followed by alphanumerics. These would NOT match:

'me..you' -- no alphanumerics after first dot
'oops-' -- no alphanumerics after hyphen
'-eesh' -- need alphanumerics before hyphen

To be strict, that last one WILL match the 'site stuff'
fragment of the pattern, but it'll fail in the whole pattern,
because after the '@' we require SOME alphanumerics.

feel free to post comments/feedback to pgsql-user...

--
"We will fight them on the beaches, we will fight them on the
sons of beaches" -- Miguel Churchill, Winston's bastard Mexican
brother.
--lifted from http://www.astray.com/acmemail/stable/documentation.xml


will(at)serensoft(dot)com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Damon Cokenias 2002-04-10 21:40:15 View INSERTs not returning OID in Postgres 7.2.1
Previous Message Bruce Momjian 2002-04-10 21:37:56 Re: refcursor returned by pl/psql to jdbc