Re: Tuning tsearch searching and indexing

From: Eric Walstad <eric(at)ericwalstad(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Tuning tsearch searching and indexing
Date: 2009-02-11 01:54:15
Message-ID: c436fcd20902101754h1a34d5b7wf3a6a231abc6de28@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

On Tue, Feb 10, 2009 at 4:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Eric,
>
>> Our 'big' table is now at ~15.7M records, it holds address data with a
>> few other fields, too. We were using Xapian to index the address
>> data. Our users need to search the address data frequently. We want
>> to try using tsearch over Xapian for various reasons but we're having
>> difficulties getting the indexes built. For example, after running
>> for about 30hours, the indexing operation failed when a scheduled
>> vacuum started:
>>
>> er-ca=# UPDATE er_service_address
>> er-ca-# SET search_tsv=to_tsvector(
>> er-ca(# 'default',
>> er-ca(# COALESCE(service_account, '') ||' '||
>> er-ca(# COALESCE(customer_name, '') ||' '||
>> er-ca(# COALESCE(addr1, '') ||' '||
>> er-ca(# COALESCE(city, '') ||' '||
>> er-ca(# COALESCE(zipcode, '') ||' '||
>> er-ca(# COALESCE(service_type, '') ||' '||
>> er-ca(# COALESCE(account_type, '')
>> er-ca(# )
>> er-ca-# ;
>
> Well, first, you're going to become an early adopter of 8.4 because of the
> multi-column TSearch indexes.
>
> Second, I'd strongly suggest that you put the concatenated column for the
> TSearch index into another table, er_service_address_fts. This both would
> make the above operation much faster, and would allow you to adopt
> asynchronous updating of the FTI if you find that the cost of updating it
> makes updates to the primary data too slow.
>
> --Josh

Hm, I'm a little nervous to dive into 8.4 before it's released,
especially for a production db. It looks like[1] it's not yet into
beta? Still, good to know that multi-column idexes are on the way.
I'll give the separate table a try and hope for a performance improvement.

Do you think there'd be any benefit to creating the tsearch indexes in
batched updates, say, on 100,000 record batches (by pk (serial))? I'm
not clear on how updates are handled internally and am assuming
there's one big transaction that might be broken into many smaller
transactions.

Thank you Josh.

Eric.
[1] http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Development_Plan

In response to

Browse sfpug by date

  From Date Subject
Next Message Josh Berkus 2009-02-11 22:23:18 Wanna go to SCALE? Free transport, crash space
Previous Message Josh Berkus 2009-02-11 00:53:29 Re: Tuning tsearch searching and indexing