Re: gsoc, text search selectivity and dllist enhancments

From: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gsoc, text search selectivity and dllist enhancments
Date: 2008-07-11 06:23:05
Message-ID: 4876FC49.2000402@students.mimuw.edu.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oleg Bartunov wrote:
> On Wed, 9 Jul 2008, Jan Urbaski wrote:
>
>> Jan Urbaski wrote:
>> Do you think it's worthwhile to implement the LC algorithm in C and
>> send it out, so others could try it out? Heck, maybe it's worthwhile
>> to replace the current compute_minimal_stats() algorithm with LC and
>> see how that compares?
>
> I and Teodor are using LC for phrase estimation in one application and
> from our understanding of the original paper this algorithm might be
> not good for sampling, since all theory behind was about streaming of
> FULL data. As for technique we use suffix tree, which should be fine for
> typical sample size.

Hm, that's a good point. I'm only reasurred by the fact, that it yields
roughly the same results as the original algorithm, which means that if
LC is bad for sampling, then the current implementation is just as bad.
Come to think of it, the current code is in a way a variant of Lossy
Counting, it's just doing the pruning after each and every new element,
isn't it?

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-11 06:30:26 Re: gsoc, text search selectivity and dllist enhancments
Previous Message Jan Urbański 2008-07-11 06:18:25 Re: gsoc, text search selectivity and dllist enhancments