Re: How to switch off Snowball stemmer for tsearch2?

From: "Ivan Zolotukhin" <ivan(dot)zolotukhin(at)gmail(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Dmitry Koterov" <dmitry(at)koterov(dot)ru>, "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to switch off Snowball stemmer for tsearch2?
Date: 2007-08-22 20:14:52
Message-ID: 751e56400708221314l36b8289i8bf9818d7185af0d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

10 days is not suspicious at all if you need to pull out text for
indexing using complex logic and/or schema (i.e. most of the time you
retrieve text, not index it). Example: you index some tree leaves
(i.e. table with 3 columns: id, parent_id and name) and want to have
redundant text index. You therefore need to retrive all leaf's
predecessors before doing to_tsvector(), something like that.

On 8/22/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
>
> > Suppose I cannot add such synonyms, because:
> >
> > 1. There are a lot of surnames, cannot take care about all of them.
> > 2. After adding a new surname I have to re-calculate all full-text indices,
> > it costs too much (about 10 days to complete the recalculation).
> >
> > So, I neet exactly what I ast - switch OFF stem guessing if a word is not in
> > the dictionary.
>
> no problem, just modify pg_ts_cfgmap, which contains mapping
> token - dictionaries.
>
> if you change configuration you should rebuild tsvector and reindex.
> 10 days looks very suspicious.
>
>
> >
> > On 8/22/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> >>
> >> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
> >>
> >>> Hello.
> >>>
> >>> We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
> >>> Now Snowball stemmer is also configured.
> >>>
> >>> How to properly switch OFF Snowball stemmer for Russian without turning
> >> off
> >>> ispell stemmer? (It is really needed, because "Ivanov" is not the same
> >> as
> >>> "Ivan".)
> >>> Is it enough and correct to simply delete the row from pg_ts_dict or
> >> not?
> >>>
> >>> Here is the dump of pg_ts_dict table:
> >>
> >> don't use dump, plain select would be better. In your case, I'd
> >> suggest to follow standard way - create synonym file like
> >> ivanov ivanov
> >> and use it before other dictionaries. Synonym dictionary will recognize
> >> 'Ivanov' and return 'ivanov'.
> >>
> >>>
> >>> dict_name dict_init dict_initoption dict_lexize dict_comment
> >>> en_ispell spell_init(internal)
> >>>
> >> DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
> >>> spell_lexize(internal,internal,integer)
> >>> en_stem snb_en_init(internal) contrib/english.stop
> >>> snb_lexize(internal,internal,integer) English Stemmer. Snowball.
> >>> ispell_template spell_init(internal)
> >>> spell_lexize(internal,internal,integer) ISpell interface. Must have
> >> .dict
> >>> and .aff files
> >>> ru_ispell_cp1251 spell_init(internal)
> >>>
> >> DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
> >>> spell_lexize(internal,internal,integer)
> >>> ru_stem_cp1251 snb_ru_init_cp1251(internal)
> >>> contrib/russian.stop.cp1251 snb_lexize(internal,internal,integer)
> >>> Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
> >>> ru_stem_koi8 snb_ru_init_koi8(internal) contrib/russian.stop
> >>> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. KOI8
> >>> Encoding
> >>> ru_stem_utf8 snb_ru_init_utf8(internal) contrib/russian.stop.utf8
> >>> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. UTF8
> >>> Encoding
> >>>
> >> simple dex_init(internal) dex_lexize(internal,internal,integer)
> >>> Simple example of dictionary.
> >>> synonym syn_init(internal)
> >>> syn_lexize(internal,internal,integer) Example of synonym dictionary
> >>> thesaurus_template thesaurus_init(internal)
> >>> thesaurus_lexize(internal,internal,integer,internal) Thesaurus
> >> template,
> >>> must be pointed Dictionary and DictFile
> >>>
> >>
> >> Regards,
> >> Oleg
> >> _____________________________________________________________
> >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> >> Sternberg Astronomical Institute, Moscow University, Russia
> >> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> >> phone: +007(495)939-16-83, +007(495)939-23-83
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 1: if posting/reading through Usenet, please send an appropriate
> >> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> >> message can get through to the mailing list cleanly
> >>
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2007-08-22 20:18:26 Re: Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery
Previous Message Marcelo de Moraes Serpa 2007-08-22 19:46:56 Re: Audit-trail engine inner-workings