Skip site navigation (1) Skip section navigation (2)

Re: phrase search

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: phrase search
Date: 2008-06-03 03:28:12
Message-ID: 1212463692.8047.51.camel@dragflick (view raw or flat)
Thread:
Lists: pgsql-hackers
On Mon, 2008-06-02 at 19:39 +0400, Teodor Sigaev wrote:
> 
> > I have attached a patch for phrase search with respect to the cvs head.
> > Basically it takes a a phrase (text) and a TSVector. It checks if the
> > relative positions of lexeme in the phrase are same as in their
> > positions in TSVector.
> 
> Ideally, phrase search should be implemented as new operator in tsquery, say # 
> with optional distance. So, tsquery 'foo #2 bar' means: find all texts where 
> 'bar' is place no far than two word from 'foo'. The complexity is about complex 
> boolean expressions ( 'foo #1 ( bar1 & bar2 )' ) and about several languages as 
> norwegian or german. German language has combining words, like a footboolbar  - 
>   and they have several variants of splitting, so result of to_tsquery('foo # 
> footboolbar') will be a 'foo # ( ( football & bar ) | ( foot & ball & bar ) )'
> where variants are connected with OR operation.

This is far more complicated than I thought.

> Of course, phrase search should be able to use indexes.

I can probably look into how to use index. Any pointers on this?

> > 
> > If the configuration for text search is "simple", then this will produce
> > exact phrase search. Otherwise the stopwords in a phrase will be ignored
> > and the words in a phrase will only be matched with the stemmed lexeme.
> 
> Your solution can't be used as is, because user should use tsquery too to use an 
> index:
> 
> column @@ to_tsquery('phrase search') AND  is_phrase_present('phrase search', 
> column)
> 
> First clause will be used for index scan and it will fast search a candidates.

Yes this is exactly how I am using in my application. Do you think this
will solve a lot of common case or we should try to get phrase search

1. Use index
2. Support arbitrary distance between lexemes
3. Support complex boolean queries

-Sushant. 

> 
> > For my application I am using this as a separate shared object. I do not
> > know how to expose this function from the core. Can someone explain how
> > to do this?
> 
> Look at contrib/ directory in pgsql's source code - make a contrib module from 
> your patch. As an example, look at adminpack module - it's rather simple.
> 
> Comments of your code:
> 1)
> +#ifdef PG_MODULE_MAGIC
> +PG_MODULE_MAGIC;
> +#endif
> 
> That isn't needed for compiled-in in core files, it's only needed for modules.
> 
> 2)
>   use only /**/ comments, do not use a // (C++ style) comments


In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2008-06-03 05:17:43
Subject: Re: rfc: add pg_dump options to dump output
Previous:From: Sushant SinhaDate: 2008-06-03 03:13:49
Subject: Re: [GENERAL] Fragments in tsearch2 headline

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group