Skip site navigation (1) Skip section navigation (2)

Re: Google Summer of Code 2008

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Google Summer of Code 2008
Date: 2008-03-09 02:38:20
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Sat, 8 Mar 2008, Jan Urbaski wrote:

>> Unfortunately, selectivity estimation for query is much difficult than just 
>> estimate frequency of individual word.
> Sure, given something like 'cats & dogs'::tsquery the frequency of 'cat' and 
> 'dog' won't suffice. But at least it's a starting point and if we estimate 
> that 80% of the documents have 'dog' and 70% have 'cat' then we can tell for 
> sure that at least 50% have both and that's a lot better than 0.1% that's 
> being returned now.

certainly yes and given that most popular queries are single word query
this would very helpful in most cases.

The reason I though about ts_stat() improvement is that we could use its
statistics for incomplete search feature people requested, when 
AND query like ( a & b &c ) rewrites to a set of AND|OR queries depending
on the terms occurency.

Oleg Bartunov, Research Scientist, Head of AstroNet (,
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su,
phone: +007(495)939-16-83, +007(495)939-23-83

In response to


pgsql-hackers by date

Next:From: Warren TurkalDate: 2008-03-09 08:32:20
Subject: timestamp datatype cleanup
Previous:From: Oleg BartunovDate: 2008-03-09 02:30:57
Subject: Re: Google Summer of Code 2008

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group