Quick Links

Re: gsoc, text search selectivity and dllist enhancments

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: gsoc, text search selectivity and dllist enhancments
Date:	2008-07-10 20:37:28
Message-ID:	13892.1215722248@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Jan Urbaski wrote:
>> Oh, one important thing. You need to choose a bucket width for the LC
>> algorithm, that is decide after how many elements will you prune your
>> data structure. I chose to prune after every twenty tsvectors.

> Do you prune after X tsvectors regardless of the numbers of lexemes in
> them? I don't think that preserves the algorithm properties; if there's
> a bunch of very short tsvectors and then long tsvectors, the pruning
> would take place too early for the initial lexemes. I think you should
> count lexemes, not tsvectors.

Yeah. I haven't read the Lossy Counting paper in detail yet, but I
suspect that the mathematical proof of limited error doesn't work if the
pruning is done on a variable spacing. I don't see anything very wrong
with pruning intra-tsvector; the effects ought to average out, since the
point where you prune is going to move around with respect to the
tsvector boundaries.

regards, tom lane

In response to

Re: gsoc, text search selectivity and dllist enhancments at 2008-07-10 20:27:31 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2008-07-10 21:02:36	Re: gsoc, text search selectivity and dllist enhancments
Previous Message	Jan Urbański	2008-07-10 20:32:26	Re: gsoc, text search selectivity and dllist enhancments