Re: Text Search

From: "D(dot) Duccini" <duccini(at)backpack(dot)com>
To: Greg Elisara <elisara(at)world-net(dot)co(dot)nz>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Text Search
Date: 2001-08-11 14:16:56
Message-ID: Pine.GSO.4.03.10108110903450.7337-100000@ra.bpsi.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice


There a several ways to implement this:

1. you can just use the built in regex operator "~*" to hit the table.
this is one way to handle mixed case searches, the other is is use some of
the functions such as "lower()" and substring()

but this is a somewhat nasty approach requiring big iron since you have to
touch every record to find a single match

2. you can do post/processing on your table and build a seperate index of
just the "important" words -- i say "important" because its up to you to
build a "stop-list" of words (usually things like a/and/the/for/with etc)
and make insertions into a two-column table of word and "id" that refers
to the record with the info

this will be faster than approach #1 at the cost of disk space (disposable
these days) -- (the postgresql INDEXES at last look were not handling
HIGH-VOLUME DUPLICATES very elegantly, but thats okay, it will eventually)

3. you can treat your text as a "blob" (binary large object) problem, and
store your text in either seperate files, or seperate file "system"

this allows you the freedom to let the reviews grow as large as you'd like
-- and store multiple formats very easily -- example, maintain a URL to
the text information in your table, and then use some other web searching
"engine" to index the text data -- like HTDIG. then all you need to do is
bury META data into the pages or self-referential URI that feeds back into
your database .cgi stuff

just some random thoughts to get you thinking :)

-duck

On Sat, 11 Aug 2001, Greg Elisara wrote:

> Hi folks. I have a product table (music cds) that stores a review in one
> of the columns (datatype - text). The copy for the review can hold
> anywhere between 100-1500 words. The table holds a little over 200, 000
> rows.
>
> I want to set up a way to search through the table using keyword
> searches on the reviews. What is the best way to approach this? Any help
> would be welcome.
>
> Regards. Greg.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

-----------------------------------------------------------------------------
david(at)backpack(dot)com BackPack Software, Inc. www.backpack.com
+1 651.645.7550 voice "Life is an Adventure.
+1 651.645.9798 fax Don't forget your BackPack!"
-----------------------------------------------------------------------------

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Paul McGarry 2001-08-12 22:49:01 Re: Text Search
Previous Message Andrew McMillan 2001-08-11 09:23:50 Re: getting postgres to listen to tcp/ip on debian.