From: | Kenneth Marshall <ktm(at)rice(dot)edu> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | sushant354(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, shamnad(at)gmail(dot)com |
Subject: | Re: dot to be considered as a word delimiter? |
Date: | 2009-06-02 12:47:25 |
Message-ID: | 20090602124725.GD18879@it.is.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote:
> Sushant Sinha <sushant354(at)gmail(dot)com> wrote:
>
> > I think that dot should be considered by as a word delimiter because
> > when dot is not followed by a space, most of the time it is an error
> > in typing. Beside they are not many valid english words that have
> > dot in between.
>
> It's not treating it as an English word, but as a host name.
>
> select ts_debug('english', 'Mr.J.Sai Deepak');
> ts_debug
> ---------------------------------------------------------------------------
> (host,Host,Mr.J.Sai,{simple},simple,{mr.j.sai})
> (blank,"Space symbols"," ",{},,)
> (asciiword,"Word, all
> ASCII",Deepak,{english_stem},english_stem,{deepak})
> (3 rows)
>
> You could run it through a dictionary which would deal with host
> tokens differently. Just be aware of what you'll be doing to
> www.google.com if you run into it.
>
> I hope this helps.
>
> -Kevin
>
In our uses for full text indexing, it is much more important to
be able to find host name and URLs than to find mistyped names.
My two cents.
Cheers,
Ken
From | Date | Subject | |
---|---|---|---|
Next Message | Aidan Van Dyk | 2009-06-02 12:48:06 | Re: PostgreSQL Developer meeting minutes up |
Previous Message | Marko Kreen | 2009-06-02 12:46:57 | Re: PostgreSQL Developer meeting minutes up |