From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Reece Hart <reece(at)harts(dot)net> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: tsearch2 and hyphenated terms |
Date: | 2008-04-11 18:07:14 |
Message-ID: | Pine.LNX.4.64.0804112206030.21547@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
We have the same problem with names in astronomy, so we implemented
dict_regex http://vo.astronet.ru/arxiv/dict_regex.html
Check it out !
Oleg
On Thu, 10 Apr 2008, Reece Hart wrote:
> I'd like to use tsearch2 to index protein and gene names. Unfortunately,
> such names are written inconsistently and sometimes with hyphens. For
> example, MCL-1 and MCL1 are semantically equivalent but with the default
> parser and to_tsvector, I see this:
>
> unison(at)u8(dot)3=> select to_tsvector('MCL1 MCL-1');
> to_tsvector
> -------------------------
> '-1':3 'mcl':2 'mcl1':1
>
> For the purposes of indexing these names, I suspect I'd get the majority
> of cases by removing a hyphen when it's followed by 1 or 2 chars from
> [a-zA-Z0-9]. Does that require a custom parser?
>
> Thanks,
> Reece
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Ivan Sergio Borgonovo | 2008-04-11 19:21:28 | SQL injection, php and queueing multiple statement |
Previous Message | Tom Lane | 2008-04-11 16:45:32 | Re: tsearch2 and hyphenated terms |