From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Latin vs non-Latin words in text search parsing |
Date: | 2007-10-22 14:36:04 |
Message-ID: | 6225.1193063764@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> I like the "aword" name more than "lword", BTW. If we change the meaning
>> of the classes, surely we can change the name as well, right?
> I'm not very familiar with the use case here. Is there a good reason to want
> to abbreviate these names? I think I would expect "ascii", "word", and "token"
> for the three categories Tom describes.
Please look at the first nine rows of the table here:
http://developer.postgresql.org/pgdocs/postgres/textsearch-parsers.html
It's not clear to me where we'd go with the names for the
hyphenated-word and hyphenated-word-part categories. Also, ISTM that
we should use related names for these three categories, since they are
all considered valid parts of hyphenated words.
Another point: "token" is probably unreasonably confusing as a name for
a token type. "Is that a token token or a word token?"
Maybe "aword", "word", and "numword"?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2007-10-22 14:38:07 | Re: 8.2.3: Server crashes on Windows using Eclipse/Junit |
Previous Message | Magnus Hagander | 2007-10-22 14:33:17 | Re: 8.2.3: Server crashes on Windows using Eclipse/Junit |