Re: [GENERAL] Determining if "in a text set"

From: Herouth Maoz <herouth(at)oumail(dot)openu(dot)ac(dot)il>
To: Richi Plana <richip(at)mozcom(dot)com>, pgsql-sql(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Determining if "in a text set"
Date: 1999-03-28 17:24:50
Message-ID: l0311070cb3240c3166d7@[147.233.148.142]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

I am redirecting the discussion to the SQL list, as it is more appropriate
than the GENERAL list.

At 06:41 +0200 on 27/03/1999, Richi Plana wrote:

> I have a table/class with a text field which contains a single word
> (possibly trailed by whitespaces). Given a phrase (a text literal composed
> of words separated by one or more whitespaces), what's the best way to
> SELECT those tuples with fields containing instances that match exactly
> one of the words in the given string?
>
> Example:
>
> Table:
> Tuple# ... FieldN
> 1 ... 'MON'
> 2 ... 'TUE'
> 3 ... 'THURS'
> 4 ... 'THU'
> 5 ... 'FRI'
>
> Given phrase:
> 'MON TUE WED THURS'
> Tuple 1 will match
> Tuple 2 will match
> Tuple 3 will match
> Tuple 4 will NOT match
> Tuple 5 will NOT match
>
> Right now, I'm extracting (using strtok(), 8^P) each word from the given
> phrase and using "SELECT * FROM {class} where FieldN LIKE '%{word}%'", but
> it's terribly slow, relies on strtok() to parse words, and Tuple 4 above
> will match.

First, do without the strtok. You could achieve the same result (that also
matches tuple 4) with:

SELECT * FROM {class}
WHERE '{phrase}' LIKE '%' || ( rtrim( FieldN ) || '%' );

Hehe... the strange expression simply cuts off spaces from the right side
of the field, and concatenates % signs on its left and right. So you match
the phrase against the keyword, not the other way around.

It is more readable if you use regular expression rather than 'like':

SELECT * FROM {class}
WHERE '{phrase}' ~ rtrim( FieldN );

Now, turning attention to eliminating the fourth tuple from matching: I'm
not an expert on regular expressions. I think postgres does not support
perl's concept of word boundaries. But if the phrase is supposed to contain
just spaces (ascii 32, if you will) as word separators, I'd match against
the field with a space on the left and nothing on the right, a space on the
right and nothing on the left, spaces on both sides, or standing alone:

SELECT * FROM {class}
WHERE '{phrase}' ~ ' ' || ( rtrim( FieldN ) || '$' )
OR '{phrase}' ~ '^' || ( rtrim( FieldN ) || ' ' )
OR '{phrase}' ~ ' ' || ( rtrim( FieldN ) || ' ' )
OR '{phrase}' = rtrim( FieldN );

The equivalent with 'like' is:

SELECT * FROM {class}
WHERE '{phrase}' LIKE '% ' || rtrim( FieldN )
OR '{phrase}' LIKE rtrim( FieldN ) || ' %'
OR '{phrase}' LIKE '% ' || ( rtrim( FieldN ) || ' %' )
OR '{phrase}' = rtrim( FieldN );

If the spaces between the words in your phrase are not just plain spaces
but can also be tabs etc, you will have to use the regexp version, and
write something like '[ \t\r\f\n]' - though I'm less than sure that
postgres's regular expressions support these. Can anyone tell us which
regexp definition postgres uses?

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Herouth Maoz 1999-03-28 17:44:51 Re: [SQL] Re: [GENERAL] Determining if "in a text set"
Previous Message Herouth Maoz 1999-03-28 16:01:52 Re: [GENERAL] using Oids to retrieve a row

Browse pgsql-sql by date

  From Date Subject
Next Message Herouth Maoz 1999-03-28 17:44:51 Re: [SQL] Re: [GENERAL] Determining if "in a text set"
Previous Message Herouth Maoz 1999-03-28 15:48:20 Re: [SQL] Odd "problem", not sure if there is a solution ....