Re: Snowball and ispell in tsearch2

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Snowball and ispell in tsearch2
Date: 2006-06-07 17:29:56
Message-ID: 44870D14.5030401@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Teodor,

I've just recently implemented an advanced full-text search function on
top of tsearch2. Searching through the manuals and websites to get the
snowball stemmer and compile my own module took me way to long. I'd
rather go fetch a cup of coffee during a 30 minute download...

That said, I don't necessarily mean that all stemmers must be included
in CVS or such. It should just be simpler for the database administrator
to install ispell or stemmer 'modules'. A non-plus-ultra solution would
be to provide packages for each language (in debian or fedora, etc..).

Perhaps we can put together the source code for all languages modules
available and provide scripts to fetch ispell data or to generate the
snowball stemmers. A debian package maintainer would have to fetch all
the data to generate all language packages. Someone else might just want
to download and compile a norwegian snowball stemmer.

I'd be willing to help with such a project. I have experience with
tsearch2 as well as with gentoo and debian packaging. I can't help with
rpm, though.

Regards

Markus

Teodor Sigaev wrote:
> We got a lot requests about including stemmers and ispell dictionaries
> for all accessible languages into tsearch2. I understand that tsearch2
> will be closer to end user. But sources of snowball stemmers is about
> 800kb, each ispell dictionaries will takes about 0.5-2M. All sizes are
> sized with compression. I am afraid that is too big size...
>
> What are opinions?
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2006-06-07 17:33:03 ADD/DROP INHERITS
Previous Message Jim C. Nasby 2006-06-07 17:27:26 Re: Compression and on-disk sorting