| From: | Kenneth Marshall <ktm(at)rice(dot)edu> |
|---|---|
| To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
| Cc: | sushant354(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, shamnad(at)gmail(dot)com |
| Subject: | Re: dot to be considered as a word delimiter? |
| Date: | 2009-06-02 12:47:25 |
| Message-ID: | 20090602124725.GD18879@it.is.rice.edu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote:
> Sushant Sinha <sushant354(at)gmail(dot)com> wrote:
>
> > I think that dot should be considered by as a word delimiter because
> > when dot is not followed by a space, most of the time it is an error
> > in typing. Beside they are not many valid english words that have
> > dot in between.
>
> It's not treating it as an English word, but as a host name.
>
> select ts_debug('english', 'Mr.J.Sai Deepak');
> ts_debug
> ---------------------------------------------------------------------------
> (host,Host,Mr.J.Sai,{simple},simple,{mr.j.sai})
> (blank,"Space symbols"," ",{},,)
> (asciiword,"Word, all
> ASCII",Deepak,{english_stem},english_stem,{deepak})
> (3 rows)
>
> You could run it through a dictionary which would deal with host
> tokens differently. Just be aware of what you'll be doing to
> www.google.com if you run into it.
>
> I hope this helps.
>
> -Kevin
>
In our uses for full text indexing, it is much more important to
be able to find host name and URLs than to find mistyped names.
My two cents.
Cheers,
Ken
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aidan Van Dyk | 2009-06-02 12:48:06 | Re: PostgreSQL Developer meeting minutes up |
| Previous Message | Marko Kreen | 2009-06-02 12:46:57 | Re: PostgreSQL Developer meeting minutes up |