Re: phrase search

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Sushant Sinha <sushant354(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: phrase search
Date: 2008-07-22 18:42:03
Message-ID: 488629FB.2030501@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> 1. What is the meaning of such a query operator?
>>
>> foo #5 bar -> true if the document has word "foo" followed by "bar" at
>> 5th position.
>>
>> foo #<5 bar -> true if document has word "foo" followed by "bar" with in
>> 5 positions
>>
>> foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
>> positions

Sounds good, but, may be it's an overkill.

>> etc .....
>>
>> 2. How to implement such query operators?
>>
>> Should we modify QueryItem to include additional distance information or
>> is there any other way to accomplish it?
>>
>> Is the following list sufficient to accomplish this?
>> a. Modify to_tsquery
>> b. Modify TS_execute in tsvector_op.c to check new operator
Exactly

>>
>> Is there anything needed in rewrite subsystem?
Yes, of course - rewrite system should support that operation.

>>
>> 3. Are these valid uses of the operators and if yes what would they
>> mean?
>>
>> foo #5 (bar & cup)
It must support! Because of lexize might return subtsquery. For example,
russian ispell can return several lexemes: "adfg" can become a 'adf | adfs |
ad', norwegian and german languages are more complicated: "abc" -> " (ab & c) |
(a & bc) | abc"

>> 4. If the operator only applies to two query items can we create an
>> index such that (foo, bar)-> documents[min distance, max distance]
>> How difficult it is to implement an index like this?
No, index should execute query 'foo & bar' and mark recheck flag to true to
execute 'foo #<5 bar' on original tsvector from table.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-07-22 18:56:36 Transaction-controlled robustness for replication
Previous Message Shane Ambler 2008-07-22 18:34:33 Re: Do we really want to migrate plproxy and citext into PG core distribution?