Re: multi terabyte fulltext searching

From: Benjamin Arai <benjamin(at)araisoft(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: multi terabyte fulltext searching
Date: 2007-03-21 16:01:31
Message-ID: 939DC5F2-448B-4CC9-A1F4-891329172F67@araisoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

By the way, what is the largest TSearch2 database that you know of
and how fast does it return results? Maybe my expectations are
unrealistic.

Benjamin

On Mar 21, 2007, at 8:42 AM, Oleg Bartunov wrote:

> Benjamin,
>
> as one of the author of tsearch2 I'd like to know more about your
> setup.
> tsearch2 in 8.2 has GIN index support, which scales much better
> than old
> GiST index.
>
> Oleg
>
> On Wed, 21 Mar 2007, Benjamin Arai wrote:
>
>> Hi,
>>
>> I have been struggling with getting fulltext searching for very
>> large databases. I can fulltext index 10s if gigs without any
>> problem but when I start geting to hundreds of gigs it becomes
>> slow. My current system is a quad core with 8GB of memory. I
>> have the resource to throw more hardware at it but realistically
>> it is not cost effective to buy a system with 128GB of memory. Is
>> there any solutions that people have come up with for indexing
>> very large text databases?
>>
>> Essentially I have several terabytes of text that I need to
>> index. Each record is about 5 paragraphs of text. I am currently
>> using TSearch2 (stemming and etc) and getting sub-optimal
>> results. Queries take more than a second to execute. Has anybody
>> implemented such a database using multiple systems or some special
>> add-on to TSearch2 to make things faster? I want to do something
>> like partitioning the data into multiple systems and merging the
>> ranked results at some master node. Is something like this
>> possible for PostgreSQL or must it be a software solution?
>>
>> Benjamin
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>> choose an index scan if your joining column's datatypes do not
>> match
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pranjal Karwal 2007-03-21 16:02:37 can't trace error!!!
Previous Message Tom Lane 2007-03-21 16:00:41 Re: [HACKERS] Remove add_missing_from_clause?