Re: multi terabyte fulltext searching

From: Benjamin Arai <benjamin(at)araisoft(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: multi terabyte fulltext searching
Date: 2007-03-21 16:14:31
Message-ID: DA88D23F-A0AB-4154-9463-12D73FFA9066@araisoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

24.

Benjamin

On Mar 21, 2007, at 9:09 AM, Joshua D. Drake wrote:

> Benjamin Arai wrote:
>> True, but what happens when my database reaches 100 terabytes? Is 5
>> seconds ok? How about 10? My problem is that I do not believe the
>> performance loss I am experiencing as the data becomes large is
>> (log the
>> # of records). This worries me because I could be doing something
>> wrong. Or I might be able to do something better.
>
> Well a couple of things you could do, especially if you have the
> ability
> to throw hardware at it.
>
> How many spindles do you have?
>
> J
>
>
>>
>> Benjamin
>>
>> On Mar 21, 2007, at 8:49 AM, Joshua D. Drake wrote:
>>
>>> Benjamin Arai wrote:
>>>> Hi,
>>>>
>>>> I have been struggling with getting fulltext searching for very
>>>> large
>>>> databases. I can fulltext index 10s if gigs without any problem
>>>> but
>>>> when I start geting to hundreds of gigs it becomes slow. My
>>>> current
>>>> system is a quad core with 8GB of memory. I have the resource
>>>> to throw
>>>> more hardware at it but realistically it is not cost effective
>>>> to buy a
>>>> system with 128GB of memory. Is there any solutions that people
>>>> have
>>>> come up with for indexing very large text databases?
>>>
>>> GIST indexes are very large.
>>>
>>>> Essentially I have several terabytes of text that I need to
>>>> index. Each
>>>> record is about 5 paragraphs of text. I am currently using
>>>> TSearch2
>>>> (stemming and etc) and getting sub-optimal results. Queries
>>>> take more
>>>> than a second to execute.
>>>
>>> you are complaining about more than a second with a terabyte of
>>> text?
>>>
>>>
>>>> Has anybody implemented such a database using
>>>> multiple systems or some special add-on to TSearch2 to make things
>>>> faster? I want to do something like partitioning the data into
>>>> multiple
>>>> systems and merging the ranked results at some master node. Is
>>>> something like this possible for PostgreSQL or must it be a
>>>> software
>>>> solution?
>>>>
>>>> Benjamin
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 9: In versions below 8.0, the planner will ignore your
>>>> desire to
>>>> choose an index scan if your joining column's datatypes do
>>>> not
>>>> match
>>>>
>>>
>>>
>>> --
>>> === The PostgreSQL Company: Command Prompt, Inc. ===
>>> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
>>> Providing the most comprehensive PostgreSQL solutions since 1997
>>> http://www.commandprompt.com/
>>>
>>> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
>>> donate
>>> PostgreSQL Replication: http://www.commandprompt.com/products/
>>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
>
> --
>
> === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive PostgreSQL solutions since 1997
> http://www.commandprompt.com/
>
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
> donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-03-21 16:14:42 Re: Anyone still using the sql_inheritance parameter?
Previous Message Tom Lane 2007-03-21 16:13:06 Re: Anyone still using the sql_inheritance parameter?