From: | Arturo Perez <aperez(at)hayesinc(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: multi terabyte fulltext searching |
Date: | 2007-03-22 18:57:10 |
Message-ID: | pan.2007.03.22.18.57.10.22444@hayesinc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, 21 Mar 2007 08:57:39 -0700, Benjamin Arai wrote:
> Hi Oleg,
>
> I am currently using GIST indexes because I receive about 10GB of new data
> a week (then again I am not deleting any information). The do not expect
> to be able to stop receiving text for about 5 years, so the data is not
> going to become static any time soon. The reason I am concerned with
> performance is that I am providing a search system for several newspapers
> since essentially the beginning of time. Many bibliographer etc would
> like to use this utility but if each search takes too long I am not going
> to be able to support many concurrent users.
>
> Benjamin
>
At a previous job, I built a system to do this. We had 3,000 publications
and approx 70M newspaper articles. Total content size (postprocessed) was
on the order of >100GB, IIRC. We used a proprietary (closed-source
not ours) search engine.
In order to reach subsecond response time we needed to horizontally scale
to about 50-70 machines, each a low-end Dell 1650. This was after about 5
years of trying to vertically scale.
-arturo
From | Date | Subject | |
---|---|---|---|
Next Message | Arturo Perez | 2007-03-22 19:04:03 | Re: xpath_list() function |
Previous Message | Tom Lane | 2007-03-22 18:41:28 | Re: Insert fail: could not open relation with OID 3221204992 |