Quick Links

Re: Latin vs non-Latin words in text search parsing

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Latin vs non-Latin words in text search parsing
Date:	2007-10-23 14:42:41
Message-ID:	11092.1193150561@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> Maybe "aword", "word", and "numword"?

Does the lack of response mean people are satisfied with that?

Fleshing the proposal out to include the hyphenated-word categories:

aword All ASCII letters
word All letters according to iswalpha()
numword Mixed letters and digits (all iswalnum())

ahword Hyphenated word, all ASCII letters
hword Hyphenated word, all letters
numhword Hyphenated word, mixed letters and digits

apart_hword Part of hyphenated word, all ASCII letters
part_hword Part of hyphenated word, all letters
numpart_hword Part of hyphenated word, mixed letters and digits

(As an example, "foo-beta1" is a numhword, with component tokens
"foo" an aword and "beta1" a numword. This is how it works now
modulo the redefinition of the base categories.)

I'm not totally thrilled with these short names for the hyphenation
categories, but they will seem at least somewhat familiar to users
of contrib/tsearch2, and it's probably not worth changing them just
to make them look prettier.

regards, tom lane

In response to

Re: Latin vs non-Latin words in text search parsing at 2007-10-22 14:36:04 from Tom Lane

Responses

Re: Latin vs non-Latin words in text search parsing at 2007-10-23 14:49:18 from Tom Lane
Re: Latin vs non-Latin words in text search parsing at 2007-10-23 14:52:19 from Michael Glaesemann
Re: Latin vs non-Latin words in text search parsing at 2007-10-23 15:19:19 from Gregory Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-10-23 14:49:18	Re: Latin vs non-Latin words in text search parsing
Previous Message	Jonah H. Harris	2007-10-23 14:08:47	Re: MVCC, undo log, and HOT