Quick Links

Re: gsoc, text search selectivity and dllist enhancments

From:	Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: gsoc, text search selectivity and dllist enhancments
Date:	2008-07-11 06:23:05
Message-ID:	4876FC49.2000402@students.mimuw.edu.pl
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Oleg Bartunov wrote:
> On Wed, 9 Jul 2008, Jan Urbaski wrote:
>
>> Jan Urbaski wrote:
>> Do you think it's worthwhile to implement the LC algorithm in C and
>> send it out, so others could try it out? Heck, maybe it's worthwhile
>> to replace the current compute_minimal_stats() algorithm with LC and
>> see how that compares?
>
> I and Teodor are using LC for phrase estimation in one application and
> from our understanding of the original paper this algorithm might be
> not good for sampling, since all theory behind was about streaming of
> FULL data. As for technique we use suffix tree, which should be fine for
> typical sample size.

Hm, that's a good point. I'm only reasurred by the fact, that it yields
roughly the same results as the original algorithm, which means that if
LC is bad for sampling, then the current implementation is just as bad.
Come to think of it, the current code is in a way a variant of Lossy
Counting, it's just doing the pruning after each and every new element,
isn't it?

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

In response to

Re: gsoc, text search selectivity and dllist enhancments at 2008-07-10 23:12:48 from Oleg Bartunov

Responses

Re: gsoc, text search selectivity and dllist enhancments at 2008-07-11 06:30:26 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2008-07-11 06:30:26	Re: gsoc, text search selectivity and dllist enhancments
Previous Message	Jan Urbański	2008-07-11 06:18:25	Re: gsoc, text search selectivity and dllist enhancments