Techniques for quickly finding words in a phrase...

From: "Saltsgaver, Scott" <scottsa(at)aiinet(dot)com>
To: "'pgsql-sql(at)postgreSQL(dot)org'" <pgsql-sql(at)postgreSQL(dot)org>
Subject: Techniques for quickly finding words in a phrase...
Date: 2000-02-11 22:00:53
Message-ID: 7283DE19D141D111AD0E00A0C95B1955024888FD@mail2.aiinet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have a large table where one of the columns is a text phrase (ex. "MARY
HAD A LITTLE LAMB"). I want to be able to search on that table with
the start of one or more words from that column ("LITT" and "LAMB").
I built a separate table whose foreign key maps back to my original table,
but also has a column with just the words from the phrase.

For example:

Table Phrase (indexed by id):
id | phrase
---+-----------------------------
1 | MARY HAD A LITTLE LAMB
2 | WHOSE FLEECE WAS WHITE AS SNOW

Table PhraseWords (indexed by word):
id | word
---+-------
1 | MARY
1 | HAD
1 | A
1 | LITTLE
1 | LAMB
2 | WHOSE
2 | FLEECE
2 | WAS
2 | WHITE
2 | AS
2 | SNOW

The number of rows in the Prase table is ~225K, and the number of rows
in the PhraseWords table is ~1M. And there are also some high-frequency
words in the PhraseWords table, such as "THE" which recur many times.
If the user attempts to search for the words "WAS WHIT SNOW" I generate
the following SQL:

SELECT p.id, phrase FROM Phrase AS p, PhraseWords AS pw
WHERE
((p.id = pw.id) AND word LIKE 'WAS%')
AND EXISTS (SELECT id FROM PhraseWords AS pw
WHERE (p.id = pw.id) AND word LIKE 'WHIT%')
AND EXISTS (SELECT id FROM PhraseWords AS pw
WHERE (p.id = pw.id) AND word LIKE 'SNOW%');

My EXPLAIN for this is:

Nested Loop (cost=8033.62 rows=172 width=48)
-> Index Scan using idx_phraseword on phraseword pw
(cost=6256.27 rows=867 width=4)

-> Index Scan using phrase_pkey on phrase p
(cost=2.05 rows=46669 width=44)

SubPlan
-> Index Scan using idx_phraseword on phraseword pw
(cost=6256.27 rows=1 width=4)

-> Index Scan using idx_phraseword on phraseword pw
(cost=6256.27 rows=1 width=4)

For some reason, the select still takes > 1 minute on a fairly decent
sized Linux box (500Mhz, 128MB ram). I was using IN clauses instead of
the EXISTS clauses, and was noticing the EXPLAIN telling me that it was
doing sequential table lookups. I then found in the FAQ that this is a
known issue. I was hoping that by using EXISTS I could do better.

So I guess my two questions are:

1) Is there a way to make this more efficient?
2) Is there another technique in general that will allow me to let
my users search the table for words (or the start of words, in
this case)?

Any help would be greatly appreciated!

Scott Saltsgaver

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2000-02-11 23:08:43 Re: [SQL] Techniques for quickly finding words in a phrase...
Previous Message Bruce Momjian 2000-02-10 18:29:11 Re: [SQL] How to quietly increment a SEQUENCE?