Re: Latin vs non-Latin words in text search parsing

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Latin vs non-Latin words in text search parsing
Date: 2007-10-23 15:19:19
Message-ID: 87k5pdq2o8.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> I wrote:
>> Maybe "aword", "word", and "numword"?
>
> Does the lack of response mean people are satisfied with that?

Sorry, I had a couple responses partially written but never finished.

If we were doing it from scratch I would suggest using longer names. At the
least I would still suggest using "ascii" or "asciiword" instead of "aword".

> Fleshing the proposal out to include the hyphenated-word categories:
>
> aword All ASCII letters
> word All letters according to iswalpha()
> numword Mixed letters and digits (all iswalnum())

This does bring up another idea. Using the ctype names. They could be named
asciiword, alphaword, alnumword. Frankly I don't think this is any nicer than
numword anyways.

> I'm not totally thrilled with these short names for the hyphenation
> categories, but they will seem at least somewhat familiar to users
> of contrib/tsearch2, and it's probably not worth changing them just
> to make them look prettier.

I tried thinking of better words for this and couldn't think of any. The only
other word for a hyphenated word I could think of is probably "compound" and
the word for parts of a compound word is "lexeme", but that's certainly not
going to be clearer (and technically it's not quite right anyway).

So in short I would still suggest using "ascii" instead of just "a" but
otherwise I think your suggestion is best.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-10-23 15:28:53 Re: Latin vs non-Latin words in text search parsing
Previous Message Tom Lane 2007-10-23 15:16:24 Re: Latin vs non-Latin words in text search parsing