Re: gsoc, text search selectivity and dllist enhancments

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gsoc, text search selectivity and dllist enhancments
Date: 2008-07-10 20:27:31
Message-ID: 20080710202731.GH3757@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jan Urbański wrote:

> Oh, one important thing. You need to choose a bucket width for the LC
> algorithm, that is decide after how many elements will you prune your
> data structure. I chose to prune after every twenty tsvectors.

Do you prune after X tsvectors regardless of the numbers of lexemes in
them? I don't think that preserves the algorithm properties; if there's
a bunch of very short tsvectors and then long tsvectors, the pruning
would take place too early for the initial lexemes. I think you should
count lexemes, not tsvectors.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2008-07-10 20:32:26 Re: gsoc, text search selectivity and dllist enhancments
Previous Message Tom Lane 2008-07-10 20:19:52 Re: Generating code coverage reports