Re: gsoc, oprrest function for text search

From: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gsoc, oprrest function for text search
Date: 2008-07-29 07:27:11
Message-ID: 488EC64F.20701@students.mimuw.edu.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Jan Urbański wrote:
>> Here's a WIP patch implementing an oprrest function for tsvector @@
>> tsquery and tsquery @@ tsvector.
>>
>> The idea is (quoting a comment)
>> /*
>> * Traverse the tsquery preorder, calculating selectivity as:
>> *
>> * selec(left_oper) * selec(right_oper) in AND nodes,
>> *
>> * selec(left_oper) + selec(right_oper) -
>> * selec(left_oper) * selec(right_oper) in OR nodes,
>> *
>> * 1 - select(oper) in NOT nodes
>> *
>> * freq[val] in VAL nodes, if the value is in MCELEM
>> * min(freq[MCELEM]) / 2 in VAL nodes, if it is not
>
> Seems reasonable.
>
>> *
>> * Implementation-wise, we sort the MCELEM array to use binary
>> * search on it.
>> */
>
> Would it be possible to store the array in sorted order, to avoid
> sorting it on every invocation of tssel?

It's being stored sorted on frequencies, like so:
[('dog', 0.9), ('cat', 0.8), ('sheep', 0.7)]
and I need it sorted on elements for bsearch().

I don't know if it's OK to break the rule that statistical data is
stored sorted on freqneucies. If so, then ts_typanalyze() would have to
change and do one more qsort() before storing the result.

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2008-07-29 07:53:26 Re: Do we really want to migrate plproxy and citext into PG core distribution?
Previous Message Heikki Linnakangas 2008-07-29 07:23:49 Re: gsoc, oprrest function for text search