Re: processing urls with tsearch2

From: "Laimonas Simutis" <laimis(at)gmail(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: processing urls with tsearch2
Date: 2007-09-17 20:42:12
Message-ID: 2b3e22740709171342n446cd300s9903b83e39bf4dd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the advice, for right now I went with the second option of
preprocessing the text before passing it to the to_tsquery.

However I would like to see what it would take to get some of the
dictionaries available out there to be hooked into the postgres on windows.
Does anyone have any pointers or ideas on where I can start to look if I
want to compile and add a dictionary to tsearch2 but on windows environment?

Thanks,

Laimis

On 9/13/07, Laimonas Simutis <laimis(at)gmail(dot)com> wrote:
>
> Any way to install the dictionary without the make? As in is there binary
> versions of it available? I am running postgresql on windows servers...
>
> On 9/13/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> >
> > On Thu, 13 Sep 2007, Laimonas Simutis wrote:
> >
> > > Hey guys,
> > >
> > > maybe anyone using tsearch2 could advise on this. With the default
> > > installation, url, host and some other tokens are processed with the
> > simple
> > > dictionary. Thus term like mywebsite.com gets stored as 'mywebsite.com'.
> > The
> > > parser correctly assigns token id of type host to the term, but then
> > the
> > > dictionary the terms gets routed through is simple and what gets
> > stored is
> > > mywebsite.com
> > >
> > > The questions are:
> > >
> > > 1) is there a dictionary available that I could utilize that will
> > remove
> > > .com, .net, .org, etc? I could write one myself, but after seeing some
> > > sample dictionary implementations and C code I try to avoid, I got
> > scared a
> > > bit.
> >
> > Yes, we have dict_regex, which was developed by Sergey Karpov, see
> > details
> > http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html<http://lynx.sao.ru/%7Ekarpov/software/postgres_dict_regex.html>
> > It uses pcre library and you need to know perl regexps.
> >
> > >
> > > 2) has anyone else dealt with this maybe in a different way?
> >
> > sure, preprocess text using prefered language before passing to
> > ro_tsvector
> >
> > >
> > >
> > > Thanks for any suggestions and help,
> > >
> > > Laimis
> > >
> >
> > Regards,
> > Oleg
> > _____________________________________________________________
> > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> > Sternberg Astronomical Institute, Moscow University, Russia
> > Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/<http://www.sai.msu.su/%7Emegera/>
> > phone: +007(495)939-16-83, +007(495)939-23-83
> >
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Raymond O'Donnell 2007-09-17 21:10:52 Re: strange TIME behaviour
Previous Message Roberto Spier 2007-09-17 20:28:50 Re: New/Custom DataType - Altering definition / seeing definition in pgAdmin3