Skip site navigation (1) Skip section navigation (2)

Tuning tsearch searching and indexing

From: Eric Walstad <eric(at)ericwalstad(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Tuning tsearch searching and indexing
Date: 2009-02-10 18:00:13
Message-ID: c436fcd20902101000w66d98971ub0b49c26852fdad6@mail.gmail.com (view raw or flat)
Thread:
Lists: sfpug
Hi All,

Our 'big' table is now at ~15.7M records, it holds address data with a
few other fields, too.  We were using Xapian to index the address
data.  Our users need to search the address data frequently.  We want
to try using tsearch over Xapian for various reasons but we're having
difficulties getting the indexes built.  For example, after running
for about 30hours, the indexing operation failed when a scheduled
vacuum started:

er-ca=# UPDATE er_service_address
er-ca-# SET search_tsv=to_tsvector(
er-ca(#         'default',
er-ca(#         COALESCE(service_account, '') ||' '||
er-ca(#         COALESCE(customer_name, '') ||' '||
er-ca(#         COALESCE(addr1, '') ||' '||
er-ca(#         COALESCE(city, '') ||' '||
er-ca(#         COALESCE(zipcode, '') ||' '||
er-ca(#         COALESCE(service_type, '') ||' '||
er-ca(#         COALESCE(account_type, '')
er-ca(#     )
er-ca-# ;


WARNING:   server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

I'm looking for advice on using tsearch on a dataset of this size,
specifically how to speed up the generation of the indexes to units of
whole hours rather than whole days.

Thanks in advance for your advice.  Details of the target system are
below; let me know if there's other settings I should post.

Eric.

System info:

$ uname -rvpio
2.6.9-42.0.2.EL #1 Thu Aug 17 17:37:40 EDT 2006 x86_64 x86_64 GNU/Linux

$ cat /etc/redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant Update 4)

$ free
             total       used       free     shared    buffers     cached
Mem:       4059812    3631484     428328          0       9128    3450236
-/+ buffers/cache:     172120    3887692
Swap:      1052248      17664    1034584


# select name, setting from pg_settings where name in ([list of fields
from http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server]);
           name            |  setting
---------------------------+-----------
 autovacuum                | on
 checkpoint_segments       | 64
 default_statistics_target | 100
 effective_cache_size      | 256000
 max_connections           | 100
 max_fsm_pages             | 840000
 max_fsm_relations         | 50000
 random_page_cost          | 4
 server_version            | 8.1.3
 shared_buffers            | 125000
 wal_buffers               | 100
 wal_sync_method           | fdatasync
 work_mem                  | 20480
(13 rows)


# select pg_size_pretty(pg_database_size('er-ca'));
 pg_size_pretty
----------------
 17 GB
(1 row)


# \d er_service_address
                                      Table "public.er_service_address"
     Column      |           Type           |
  Modifiers
-----------------+--------------------------+-----------------------------------------------------------------
 modified_by_id  | integer                  | not null
 created_by_id   | integer                  | not null
 modified_on     | timestamp with time zone | not null
 created_on      | timestamp with time zone |
 service_account | character varying(255)   | not null
 addr1           | character varying(255)   | not null
 addr2           | character varying(255)   |
 city            | character varying(255)   |
 state           | character varying(2)     | not null
 zipcode         | character varying(5)     |
 zip4            | character varying(4)     |
 address_hash    | character varying(128)   |
 fuzzy_hash      | character varying(128)   |
 ut_id           | integer                  | not null
 meter_number    | character varying(80)    |
 customer_name   | character varying(255)   |
 service_type    | character varying(1)     |
 account_type    | character varying(1)     |
 status          | character varying(2)     |
 id              | integer                  | not null default
nextval('er_service_address_id_seq'::regclass)
 search_tsv      | tsvector                 |
Indexes:
    "er_service_address_pkey" PRIMARY KEY, btree (id)
    "er_service_address_account_type" btree (account_type)
    "er_service_address_addr1" btree (addr1)
    "er_service_address_address_hash" btree (address_hash)
    "er_service_address_city" btree (city)
    "er_service_address_created_by_id" btree (created_by_id)
    "er_service_address_customer_name" btree (customer_name)
    "er_service_address_service_account" btree (service_account)
    "er_service_address_service_type" btree (service_type)
    "er_service_address_status" btree (status)
    "er_service_address_ut_id" btree (ut_id)
    "er_service_address_zip4" btree (zip4)
    "er_service_address_zipcode" btree (zipcode)
Foreign-key constraints:
    "$1" FOREIGN KEY (modified_by_id) REFERENCES auth_user(id)
    "$2" FOREIGN KEY (created_by_id) REFERENCES auth_user(id)
    "$3" FOREIGN KEY (ut_id) REFERENCES er_utility(id)

Responses

sfpug by date

Next:From: Steve CrawfordDate: 2009-02-10 19:00:43
Subject: Anyone need a ride?
Previous:From: Josh BerkusDate: 2009-02-10 00:00:48
Subject: Re: pgDay San Francisco

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group