possible bug in cover density ranking?

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: possible bug in cover density ranking?
Date: 2009-01-29 06:07:32
Message-ID: 1233209252.18692.24.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am running postgres 8.3.1. In tsrank.c I am looking at the cover
density function used for ranking while doing text search:
float4
calc_rank_cd(float4 *arrdata, TSVector txt, TSQuery query, int method)

Here is the excerpt of code that I think may possibly have bug when
document is big enough to exceed the 16383 position limit.

CODE
=======================================================
Cpos = ((double) (ext.end - ext.begin + 1)) / InvSum;

/*
* if doc are big enough then ext.q may be equal to ext.p due to limit
* of posional information. In this case we approximate number of
* noise word as half cover's length
*/
nNoise = (ext.q - ext.p) - (ext.end - ext.begin);
if (nNoise < 0)
nNoise = (ext.end - ext.begin) / 2
Wdoc += Cpos / ((double) (1 + nNoise));
=======================================================

As per my understanding, ext.end -ext.begin + 1 is the number of query
items in the cover and ext.q-ext.p says the length of the cover.

So consider a query with two query items. When we run out of position
information, Cover returns ext.q = 16383 and ext.p = 16383 and the
number of query items= ext.end-ext-begin + 1 = 2

nNoise becomes -1 and then nNoise is initialized to (ext.end
-ext.begin)/2 = 0
Wdoc becomes Cpos = 2/InvSum = 2/(1/0.1+1/0.1) = 0.1

Is this what is desired? It seems to me that Wdoc is getting a high
ranking even when we are not sure of the position information.

The comment above says that "In this case we approximate number of
noise word as half cover's length". But we do not know the cover's
length in this case as ext.p and ext.q are both unreliable. And ext.end
-ext.begin is not the "cover's length". It is the number of query items
found in the cover.

Any clarification would be useful.

Thanks,
-Sushant.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2009-01-29 06:44:44 Re: 8.4 release planning
Previous Message Jonah H. Harris 2009-01-29 05:55:36 Re: How to get SE-PostgreSQL acceptable