From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Ts_rank internals |
Date: | 2007-09-11 07:19:51 |
Message-ID: | Pine.LNX.4.64.0709111118150.2767@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 11 Sep 2007, Teodor Sigaev wrote:
>> I tried to understand how ts_rank works, but I failed. What does Cover
>> function do? How does it work? What is the DocRepresentation data
>> structure like? I can see the definition of the struct, and the
>> get_docrep function to convert to that format, but by reading those I
>> can't figure out what the resulting DocRepresentation looks like.
>> I wonder if we could get rid of the istrue flag in QueryOperand, and use
>> a local BitmapSet variable instead? It seems wrong to have a temporary
>> flag that's only used in one function, in a struct that's used everywhere.
> It's a play around CDR algorithms (Cover Density Ranking).
>
> Based on paper Clarke et al., Relevance Ranking for One to Three Term
> Queries. " (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I
> lost the article itself, but may be Oleg has it. Simple and short description
> is placed at http://www2002.org/CDROM/refereed/643/node7.html.
>
> We change original algorithm to support weight of lexeme, details are on
> Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking
Actually, we used two papers
http://citeseer.ist.psu.edu/clarke00relevance.html
and
http://portal.acm.org/ft_gateway.cfm?id=333137&type=pdf&dl=GUIDE&dl=ACM
I can send you the latter if you have no access to the ACM.
>
> Array of DocRepresentation is a representation of document, it contains only
> lexemes from both tsvector and tsquery, and lexemes are ordered by position -
> as in original doc. Each DocRepresentation has links to corresponding
> QueryOperand to optimize query execution while extent search. When we
> enlarge current extent for one word then we set istrue flag for corresponding
> QueryOperand and execution tsquery from cover becomes very simple task.
>
> It's possible to eliminate istrue flag, but it's needed to implement
> algorithm to execute tsquery over continuos part of document, not over whole
> document.
>
>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2007-09-11 07:41:34 | Re: invalidly encoded strings |
Previous Message | Tatsuo Ishii | 2007-09-11 07:17:06 | Re: invalidly encoded strings |