Re: Question regarding contrib/fulltextindexing

From: "Derek Barrett" <derekbarrett(at)graffiti(dot)net>
To: <andrew(at)catalyst(dot)net(dot)nz>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Question regarding contrib/fulltextindexing
Date: 2002-07-14 22:41:02
Message-ID: 20020714224102.22447.qmail@graffiti.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Thanks Andrew, that solution seems to be easier in my mind.

Okay, so I create a lookup table for the description field.

CREATE TABLE lookup_description (

pk integer
id integer
search_word varchar(50)

)

(pk is the primary key, id being a foreign key)

(By the way, any recommendation in sizing the search_word field? Are there strings that are large enough that aren't worth indexing?)

My user INSERTS the following string:

"The quick brown fox jumped over the moon and another fox."

In my code, I use a for loop, and take the text string and put it into an array, and fill up the lookup_description table. Of course, I will create a noisewords filter to remove words like the, a, an from this list. What about duplicate words? Should those be filtered out as well? In this example, fox is duplicated. I would assume that leaving in the duplicates, might be useful later if I decide to implement a relevency-type of search engine. (Rank the results based on how many times, "fox" is found).

TABLE lookup_description

pk id search_word
-- -- -----------
1 1 quick
2 1 brown
3 1 fox
4 1 jumped
5 1 over
6 1 moon
7 1 another
8 1 fox

Then I can create an index on the search_word column. Later when I do my SELECT query, I will join this lookup query to the main query.

Is that the idea?

Derek

----- Original Message -----
From: Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>
Date: 14 Jul 2002 12:16:59 +1200
To: Derek Barrett <derekbarrett(at)graffiti(dot)net>
Subject: Re: [NOVICE] Question regarding contrib/fulltextindexing

> On Sun, 2002-07-14 at 08:53, Derek Barrett wrote:
> >
> > In my situation, I need to match exact words, so I've used regular expressions to search on a varchar(10000) field:
> >
> > SELECT *
> > FROM table
> > WHERE description ~* ('[^a-zA-Z0-9]($keyword[$x])[^a-zA-Z0-9]');
> >
> > Would this module still be useful in my situation?
>
> I was doing word search with a modified version of the fulltextindex
> code, but in the end I found it easier to write a perl program to do
> build the index table - the trigger approach was more work to manage
> than it seemed it should be.
>
> Regards,
> Andrew.
> --
> --------------------------------------------------------------------
> Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington
> WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
> DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267
> Are you enrolled at http://schoolreunions.co.nz/ yet?
>
>

--
_______________________________________________
Get your free email from http://www.graffiti.net

Powered by Outblaze

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Andrew McMillan 2002-07-14 23:39:12 Re: Question regarding contrib/fulltextindexing
Previous Message Norman Khine 2002-07-14 21:41:00 Re: Multiple table insert using a CSV list as the datasource