Skip site navigation (1) Skip section navigation (2)

Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Date: 2007-11-09 12:01:04
Message-ID: 47344C00.2010305@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Jan Urbański wrote:
>>> The solution I came up with was simple: write a dictionary, that does
>>> only one thing: looks up the lexeme in a stopwords file and either
>>> discards it or returns NULL.
>> Doesn't the "simple" dictionary handle this?
> 
> I don't think so. The 'simple' dictionary discards stopwords, but
> accepts any other lexemes. So if use {'simple', 'pl_ispell'} for my
> config, I'll get rid of the stopwords, but I won't get any lexemes
> stemmed by ispell. Every lexeme that's not a stopword will produce the
> very same lexeme (this is how I think the 'simple' dictionary works).
> 
> My dictionary does basically the same thing as the 'simple' dictionary,
> but it returns NULL instead of the original lexeme in case the lexeme is
> not found in the stopwords file.

In the long term, what we really need a more flexible way to chain 
dictionaries. In this case, you would first check against one stopword 
list, eliminating 'od', then check the ispell dictionary, and then check 
another stopword list without 'od'.

I suggested that a while ago 
(http://archives.postgresql.org/pgsql-hackers/2007-08/msg01036.php). 
Hopefully Oleg or someone else gets around restructuring the 
dictionaries in a future release.

I wonder if you could hack the ispell dictionary file to treat oda 
specially?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-hackers by date

Next:From: Alvaro HerreraDate: 2007-11-09 12:25:10
Subject: Re: New tzdata available
Previous:From: Jan UrbańskiDate: 2007-11-09 11:44:07
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords

pgsql-patches by date

Next:From: Jan UrbańskiDate: 2007-11-09 12:28:38
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Previous:From: Jan UrbańskiDate: 2007-11-09 11:44:07
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group