| From: | Reece Hart <reece(at)harts(dot)net> |
|---|---|
| To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
| Subject: | tsearch2 and hyphenated terms |
| Date: | 2008-04-11 05:17:25 |
| Message-ID: | 1207891045.6903.14.camel@snafu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
I'd like to use tsearch2 to index protein and gene names. Unfortunately,
such names are written inconsistently and sometimes with hyphens. For
example, MCL-1 and MCL1 are semantically equivalent but with the default
parser and to_tsvector, I see this:
unison(at)u8(dot)3=> select to_tsvector('MCL1 MCL-1');
to_tsvector
-------------------------
'-1':3 'mcl':2 'mcl1':1
For the purposes of indexing these names, I suspect I'd get the majority
of cases by removing a hyphen when it's followed by 1 or 2 chars from
[a-zA-Z0-9]. Does that require a custom parser?
Thanks,
Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavan Deolasee | 2008-04-11 06:18:24 | Re: begin transaction locks out other connections |
| Previous Message | A. Kretschmer | 2008-04-11 05:02:44 | Re: Date / interval question |