Quick Links

Re: Full text indexing (and errors!)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Mitch Vincent" <mitch(at)venux(dot)net>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: Full text indexing (and errors!)
Date:	2000-05-21 19:44:15
Message-ID:	10468.958938255@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

"Mitch Vincent" <mitch(at)venux(dot)net> writes:
> attname | attdisbursion | starelid | staattnum | staop | stanullfrac |
> stacommonfrac | stacommonval | staloval | stahival
> ---------+---------------+----------+-----------+-------+-------------+-----
> ----------+--------------+----------+----------
> string | 0.00208943 | 1161760 | 1 | 1066 | 0 |
> 0.0100436 | on | 00 | zzz

Hmm, so the most common word is "on" accounting for about 1% of the
entries. Although I don't think that stacommonfrac directly affects
this particular query plan, it'd still be a good idea to try to push it
down. fti.c has a provision for ignoring "stop words", but its stopword
list seems to be empty by default. You might want to throw in "the" and
"on" and any other noisewords you're unlikely to want to search for.
That should help reduce the size of the fti table, too...

Actually ... waitasec. stacommonfrac *does* affect this query plan in
7.0 release. Before you do anything else, try enabling the new LIKE
estimator code (see contrib/likeplanning/ for details) and see what
sort of plan you get then. The estimated selectivity should go *way*
down, and that ought to change the plan.

You'd still be well advised to get rid of as many stop words as you can.

regards, tom lane

In response to

Re: Full text indexing (and errors!) at 2000-05-21 19:15:46 from Mitch Vincent

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Patrick Coulombe	2000-05-22 04:50:19	duplicate key
Previous Message	Mitch Vincent	2000-05-21 19:15:46	Re: Full text indexing (and errors!)