From: | Eric Walstad <eric(at)ericwalstad(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | sfpug(at)postgresql(dot)org |
Subject: | Re: Tuning tsearch searching and indexing |
Date: | 2009-02-11 01:54:15 |
Message-ID: | c436fcd20902101754h1a34d5b7wf3a6a231abc6de28@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | sfpug |
On Tue, Feb 10, 2009 at 4:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Eric,
>
>> Our 'big' table is now at ~15.7M records, it holds address data with a
>> few other fields, too. We were using Xapian to index the address
>> data. Our users need to search the address data frequently. We want
>> to try using tsearch over Xapian for various reasons but we're having
>> difficulties getting the indexes built. For example, after running
>> for about 30hours, the indexing operation failed when a scheduled
>> vacuum started:
>>
>> er-ca=# UPDATE er_service_address
>> er-ca-# SET search_tsv=to_tsvector(
>> er-ca(# 'default',
>> er-ca(# COALESCE(service_account, '') ||' '||
>> er-ca(# COALESCE(customer_name, '') ||' '||
>> er-ca(# COALESCE(addr1, '') ||' '||
>> er-ca(# COALESCE(city, '') ||' '||
>> er-ca(# COALESCE(zipcode, '') ||' '||
>> er-ca(# COALESCE(service_type, '') ||' '||
>> er-ca(# COALESCE(account_type, '')
>> er-ca(# )
>> er-ca-# ;
>
> Well, first, you're going to become an early adopter of 8.4 because of the
> multi-column TSearch indexes.
>
> Second, I'd strongly suggest that you put the concatenated column for the
> TSearch index into another table, er_service_address_fts. This both would
> make the above operation much faster, and would allow you to adopt
> asynchronous updating of the FTI if you find that the cost of updating it
> makes updates to the primary data too slow.
>
> --Josh
Hm, I'm a little nervous to dive into 8.4 before it's released,
especially for a production db. It looks like[1] it's not yet into
beta? Still, good to know that multi-column idexes are on the way.
I'll give the separate table a try and hope for a performance improvement.
Do you think there'd be any benefit to creating the tsearch indexes in
batched updates, say, on 100,000 record batches (by pk (serial))? I'm
not clear on how updates are handled internally and am assuming
there's one big transaction that might be broken into many smaller
transactions.
Thank you Josh.
Eric.
[1] http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Development_Plan
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2009-02-11 22:23:18 | Wanna go to SCALE? Free transport, crash space |
Previous Message | Josh Berkus | 2009-02-11 00:53:29 | Re: Tuning tsearch searching and indexing |