| From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Google Summer of Code 2008 | 
| Date: | 2008-03-09 02:30:57 | 
| Message-ID: | Pine.LNX.4.64.0803090528410.10010@sn.sai.msu.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sat, 8 Mar 2008, Tom Lane wrote:
> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
>> On Sat, 8 Mar 2008, Jan Urbaski wrote:
>>> I have a feeling that in many cases identifying the top 50 to 300 lexemes
>>> would be enough to talk about text search selectivity with a degree of
>>> confidence. At least we wouldn't give overly low estimates for queries
>>> looking for very popular words, which I believe is worse than givng an overly
>>> high estimate for a obscure query (am I wrong here?).
>
>> Unfortunately, selectivity estimation for query is much difficult than
>> just estimate frequency of individual word.
>
> It'd be an oversimplification, sure, but almost any degree of smarts
> would be a huge improvement over what we have now ...
yes, given that the most popular queries are just one-word long.
 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Oleg Bartunov | 2008-03-09 02:38:20 | Re: Google Summer of Code 2008 | 
| Previous Message | Tom Lane | 2008-03-09 00:33:54 | Re: patternsel() and histogram_selectivity() and the hard cutoff of 100 |