Re: Efficient Searching of Large Text Fields

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Aaron Bono <postgresql(at)aranya(dot)com>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, PostgreSQL <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Efficient Searching of Large Text Fields
Date: 2006-06-14 07:17:19
Message-ID: Pine.GSO.4.63.0606141114100.10866@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Tue, 13 Jun 2006, Aaron Bono wrote:

> I will look that tsearch (at . It appears their 8.1.x version is
> still in development and I use PostgreSQL 8.1.3 but it is worth trying
> - I'm not in a hurry for that feature anyway.
>
> I also looked at PHPBB a little - it appears their database stores
> words but the code is so difficult to dig through I was not sure about
> their implementation or even what they used it for. Would it be worth
> the work to save the text into a separate searchable table that kept
> individual words and word counts or would that be more work and eat up
> more space than it is worth? You could actually index the words that
> way and get much quicker searches. Then again, as I read through
> tsearch, it may make this approach unnecessary...
>
> I have also seen what looks like people using search results tables
> that, after a search is performed, save a list of the results. For
> example, if I were doing a search of a forum, I could save the search
> in a table like this:
>
> forum_topic
> forum_topic_id (PK)
> forum_topic_name
> etc...
>
> forum_topic_search
> forum_topic_search_id (PK)
> forum_topic_search_dt
> forum_topic_search_desc
>
> forum_topic_search_results
> forum_topic_search_results_id (PK)
> forum_topic_search_id (FK)
> sort_index (int to tell us the order the results are returned in)
> forum_topic_id (FK)
>
> This way you can allow users to page through the results without
> having to constantly research or cache the results somewhere in
> memory.
>
> Has anyone tried an approach like this?
>
> When do you clean these search tables out? They could get quite large
> after a while.

You might be surprized, but queries in general are very similar.
Analyze your search log, after normalization you could estimate the total
number of distinct queries. We developed search daemon for the big blog
and it worked quite well.

>
> Thanks!
> Aaron
>
> On 6/13/06, Rod Taylor <pg(at)rbt(dot)ca> wrote:
>> It won't help at all. Fast partial matches against arrays is nearly
>> impossible. You might take a look at tsearch though.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Achilleus Mantzios 2006-06-14 07:24:11 Re: Efficient Searching of Large Text Fields
Previous Message Oleg Bartunov 2006-06-14 07:07:41 Re: Efficient Searching of Large Text Fields