Re: possible bug in cover density ranking?

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: possible bug in cover density ranking?
Date: 2009-01-29 18:54:12
Message-ID: 9fb559330901291054l13164cf8ia45caa1ea52cdc96@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> Is this what is desired? It seems to me that Wdoc is getting a high
>> ranking even when we are not sure of the position information.
>>
> 0.1 is not very high rank, and we could not suggest any reasonable rank in
> this case. This document may be good, may be bad. rank_cd is not limited by
> 1.

For a cover of 2 query items, 0.1 is actually the maximum rank. This is only
possible when both query items are adjacent to each other.

0.1 may not seem too high when we look at its absoule value. But the problem
is we are ranking a document for which we have no positional information
available higher than a document for which we may have positional
information available with let suppose the cover length of 3. I think we
should rank the document with cover length 3 higher than the document for
which we have no positional information (and assume cover length of 2 as we
are doing now).

I feel that if ext.p=ext.q for query items > 1, then we should not count
that cover for ranking at all. Or, another option will be to significantly
inflate nNoise in this scenrio to say 100. Putting
nNoise=(ext.end-ext.begin)/2 is way too low for covers that we have no idea
on (it is 0 for query items = 2).

I am not assuming or suggesting that rank_cd is bounded by one. Off course
its rank increases as more and more covers are added.

Thanks,
Sushant.

>
>
>
>> The comment above says that "In this case we approximate number of
>> noise word as half cover's length". But we do not know the cover's
>> length in this case as ext.p and ext.q are both unreliable. And ext.end
>> -ext.begin is not the "cover's length". It is the number of query items
>> found in the cover.
>>
>
> Yeah, but if there is no information then information is absent :), but I
> agree with you to change comment
> --
> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
> WWW:
> http://www.sigaev.ru/
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2009-01-29 19:01:19 Re: reloptions with a "namespace"
Previous Message Josh Berkus 2009-01-29 18:36:21 Re: Commitfest infrastructure (was Re: 8.4 release planning)