Compound words giving undesirable results with tsearch2

From: Lars Haugseth <njus(at)larshaugseth(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Compound words giving undesirable results with tsearch2
Date: 2006-05-30 13:39:48
Message-ID: 87lksjeiqa.fsf@durin.larshaugseth.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've setup a database using tsearch2, configured with support for compound
words according to the excellent guide found here:

http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words

This works fine. There is however one drawback that I'd like to know
whether can be remedied. Let's say I want to search for records containing
the word 'fritekst', which is a compound Norwegian word meaning
'free text'.

testdb=# select to_tsquery('default_norwegian', 'fritekst');
to_tsquery
------------------------------
'fritekst' | 'fri' & 'tekst'
(1 row)

Now, this will indeed match those records, but it will also match any
records containing both of the words 'fri' and 'tekst', without regard
to whether they are next to each other or in completely different parts
of the text being indexed. In many situations, this will lead to a lot
of 'false' matches, seen from a user perspective.

Ideas on how to handle this problem will be much appreciated.

--
Lars Haugseth

"If anyone disagrees with anything I say, I am quite prepared not only to
retract it, but also to deny under oath that I ever said it." -Tom Lehrer

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-05-30 13:45:57 Re: [GENERAL] 8.1.4 - problem with PITR - .backup.done / backup.ready version of the same file at the same time.
Previous Message Bruce Momjian 2006-05-30 12:56:45 Re: pgcrypto sha256/384/512 don't work on Redhat. Please help!