Re: using Tsearch2 for chemical text

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Rajarshi Guha <rguha(at)indiana(dot)edu>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: using Tsearch2 for chemical text
Date: 2007-07-26 05:53:45
Message-ID: Pine.LNX.4.64.0707260950280.18739@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 25 Jul 2007, Rajarshi Guha wrote:

> Hi, I have a table with about 9M entries. The table has 2 fields: id and name
> which are of serial and text types respectively. I have a ordinary index on
> the text field which allows me to do searches in reasonable time. Most of my
> searches are of the form
>
> select * from mytable where name ~ 'some text query'
>
> I know that the Tsearch2 module will let me have very efficient text
> searches. But if I understand correctly, it's based on a language specific
> dictionary.

wrong ! it comes with some written human language dictionaries, but you can
write your very own dictionaries. dictionary is just a C-program.

>
> My problem is that the name column contains names of chemicals. Now for many
> cases this may simply be a number (1674-56-2) and in other cases it may be an
> alphanumeric string (such as (-)O-acetylcarnitine or
> 1,2-cis-dihydroxybenzoate). In some cases it is a well-known word (say viagra
> or calcium chloride or pentathol).
>
> My question is: will Tsearch2 be able to handle this type of text? Or will it
> be hampered by the fact that the bulk of the rows do not correspond to
> ordinary English

Oh, sure. See, for example, our dict_regex dictionary, we use for
astronomical search.
http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html

This is a work in progress, but it works.

>
> -------------------------------------------------------------------
> Rajarshi Guha <rguha(at)indiana(dot)edu>
> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
> -------------------------------------------------------------------
> My Ethicator machine must have had a built-in moral
> compromise spectral phantasmatron! I'm a genius."
> -Calvin
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2007-07-26 06:08:37 Re: using Tsearch2 for chemical text
Previous Message Naz Gassiep 2007-07-26 05:53:05 Re: using Tsearch2 for chemical text