Creation of tsearch2 index is very slow

From: Stephan Vollmer <svollmer(at)gmx(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Creation of tsearch2 index is very slow
Date: 2006-01-20 14:01:59
Message-ID: 43D0ED57.1020608@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Hello!

I noticed that the creation of a GIST index for tsearch2 takes very
long - about 20 minutes. CPU utilization is 100 %, the resulting
index file size is ~25 MB. Is this behaviour normal?

Full text columns: title author_list
tsearch2 word lists: fti_title fti_author_list
tsearch2 indexes: idx_fti_title idx_fti_author_list

The table has 700,000 records. When I create a normal B-Tree index
on the same column for testing purposes, it works quite fast
(approx. 30 seconds).

The columns that should be indexed are small, only about 10 words on
average.

System specs:
Athlon64 X2 3800+, 2 GB RAM
PostgreSQL 8.1.2, Windows XP SP2

I've never noticed this problem before, so could it probably be
related to v8.1.2? Or is something in my configuration or table
definition that causes this sluggishness?

Thanks very much in advance for your help!

- Stephan

This is the table definition:
-----------------------------------------------------------------
CREATE TABLE publications
(
id int4 NOT NULL DEFAULT nextval('publications_id_seq'::regclass),
publication_type_id int4 NOT NULL DEFAULT 0,
keyword text NOT NULL,
mdate date,
"year" date,
title text,
fti_title tsvector,
author_list text,
fti_author_list tsvector,
overview_timestamp timestamp,
overview_xml text,
CONSTRAINT publications_pkey PRIMARY KEY (keyword) USING INDEX
TABLESPACE dblp_index,
CONSTRAINT publications_publication_type_id_fkey FOREIGN KEY
(publication_type_id)
REFERENCES publication_types (id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT publications_year_check CHECK (date_part('month'::text,
"year") = 1::double precision AND date_part('day'::text, "year") =
1::double precision)
)
WITHOUT OIDS;

CREATE INDEX fki_publications_publication_type_id
ON publications
USING btree
(publication_type_id)
TABLESPACE dblp_index;

CREATE INDEX idx_fti_author_list
ON publications
USING gist
(fti_author_list)
TABLESPACE dblp_index;

CREATE INDEX idx_fti_title
ON publications
USING gist
(fti_title)
TABLESPACE dblp_index;

CREATE INDEX idx_publications_year
ON publications
USING btree
("year")
TABLESPACE dblp_index;

CREATE INDEX idx_publications_year_part
ON publications
USING btree
(date_part('year'::text, "year"))
TABLESPACE dblp_index;

CREATE TRIGGER tsvectorupdate_all
BEFORE INSERT OR UPDATE
ON publications
FOR EACH ROW
EXECUTE PROCEDURE multi_tsearch2();

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nikola Ivanov 2006-01-20 14:11:43 Re: logging connections
Previous Message Martijn van Oosterhout 2006-01-20 13:57:43 Re: How to convert Big5 to UTF8

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Vollmer 2006-01-20 14:17:07 Re: Creation of tsearch2 index is very slow
Previous Message Pandurangan R S 2006-01-20 09:35:49 Re: Retaining execution plans between connections?