Re: possible bug in cover density ranking?

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: possible bug in cover density ranking?
Date: 2009-05-02 01:20:34
Message-ID: 1241227234.4633.1.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I see this as open items here

http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items

Any interest in fixing this?

-Sushant.

On Thu, 2009-01-29 at 13:54 -0500, Sushant Sinha wrote:
>
>
> On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev <teodor(at)sigaev(dot)ru>
> wrote:
> Is this what is desired? It seems to me that Wdoc is
> getting a high
> ranking even when we are not sure of the position
> information.
> 0.1 is not very high rank, and we could not suggest any
> reasonable rank in this case. This document may be good, may
> be bad. rank_cd is not limited by 1.
>
>
> For a cover of 2 query items, 0.1 is actually the maximum rank. This
> is only possible when both query items are adjacent to each other.
>
> 0.1 may not seem too high when we look at its absoule value. But the
> problem is we are ranking a document for which we have no positional
> information available higher than a document for which we may have
> positional information available with let suppose the cover length of
> 3. I think we should rank the document with cover length 3 higher than
> the document for which we have no positional information (and assume
> cover length of 2 as we are doing now).
>
> I feel that if ext.p=ext.q for query items > 1, then we should not
> count that cover for ranking at all. Or, another option will be to
> significantly inflate nNoise in this scenrio to say 100. Putting
> nNoise=(ext.end-ext.begin)/2 is way too low for covers that we have no
> idea on (it is 0 for query items = 2).
>
> I am not assuming or suggesting that rank_cd is bounded by one. Off
> course its rank increases as more and more covers are added.
>
> Thanks,
> Sushant.
>
>
>
> The comment above says that "In this case we
> approximate number of
> noise word as half cover's length". But we do not know
> the cover's
> length in this case as ext.p and ext.q are both
> unreliable. And ext.end
> -ext.begin is not the "cover's length". It is the
> number of query items
> found in the cover.
>
>
> Yeah, but if there is no information then information is
> absent :), but I agree with you to change comment
> --
> Teodor Sigaev E-mail:
> teodor(at)sigaev(dot)ru
> WWW:
> http://www.sigaev.ru/
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vinicius Abrahao 2009-05-02 01:40:22 [OT?] how postgresql fits in
Previous Message Chuck McDevitt 2009-05-02 01:08:05 Updated Korean character set mappings