Re: multi terabyte fulltext searching

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Benjamin Arai <benjamin(at)araisoft(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: multi terabyte fulltext searching
Date: 2007-03-21 16:35:56
Message-ID: Pine.LNX.4.64.0703211932250.12152@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 21 Mar 2007, Benjamin Arai wrote:

> Can't you implement something similar to google by aggregating results for
> TSearch2 over many machines?

tsearch2 doesn't use any global statistics, so, in principle, you
should be able to run fts on several machines and combine them using
dblink (contrib/dblink).

>
> Benjamin
> On Mar 21, 2007, at 8:59 AM, Teodor Sigaev wrote:
>
>> I'm afraid that fulltext search on multiterabytes set of documents can not
>> be implemented on any RDBMS, at least on single box. Specialized fulltext
>> search engines (with exact matching and time to search about one second)
>> has practical limit near 20 millions of docs, cluster - near 100 millions.
>> Bigger collections require engines like a google.
>>
>>
>> --
>> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
>> WWW:
>> http://www.sigaev.ru/
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Eaton 2007-03-21 16:36:34 best way to kill long running query?
Previous Message Saqib Awan 2007-03-21 16:23:05 Limiting user connnections on 7.4