Need to improve performance

From: Vassili A Akimov <vassili(at)erols(dot)com>
To: pgsql-sql(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Need to improve performance
Date: 2000-06-23 14:27:18
Message-ID: 395373C5.B1C2A566@erols.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

Hallo,
My name is Vassili Akimov. I work with postgres, we maintain the main
database for our sponsors on it. One of thier requirements,- perform the

search through the long text field. Sometimes this field even exceeds
32k so we cut off the rest. But search trough this long field is too
slow. (it takes approximately 4 minutes of proccessor time on Intel
pentium II 266MHz). So we were advised to use "fulltextindex".- chunk
this fields on single words and make new table with words and oids in
it. After we made this table its size was 2940360 records. And I tried
to measure the time:

Create table app_fti(string varchar(32),id oid);

fill this table with words and oid.(as it explained in
fulltextindex(update main table))

create index app_fti_idx on app_fti(string,id);
cluster app_fti_idx on app_fti;
vacuum;
vacuum analyze;
select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
f2.string~*'basic' and f1.id=f2.id;

this select takes about the same time that the select we used for
searching through whole original text field. So we tried slightly
different approach:

Create table app_fti(string varchar(32),id oid);
fill this table with words and oid. (as it explained in fulltextindex)
create index app_fti_idx on app_fti(string);
create index app_fti_id_idx on app_fti(id);
cluster app_fti_idx on app_fti;
vacuum;
vacuum analyze;
select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
f2.string~*'basic' and f1.id=f2.id;

this select takes slightly less time but not significant. And if we
would add one more word in search criteria, it would add 1 more minute
to the search time. So we can't use this for our database.
the question is can we do something to make this type of search faster?
we need to make performace of our search at least twice faster than it
is now.
original table and search we do now are:

Table "applicant"
Attribute | Type | Modifier

-----------------------------+-----------+------------------------------------

hours/shift | char(70) |
position | text |
type of employment | integer |
date of applying | date | default date(now())
time of applying | time | default "time"(now())
salary | integer |
salary type | char(25) | not null default 'anually'
last name | char(20) |
first name | char(20) |
mid name | char(20) |
file reference | char(100) |
street address | text |
city | char(25) |
state | char(2) |
zip code | char(5) |
country | char(2) | default 'US'
permanent street address | text |
permanent city | char(25) |
permanent state | char(2) | default 'VA'
permanent zip code | char(5) |
permanent country | char(2) | default 'US'
home phone area code | smallint |
home phone number | integer |
home phone extension | char(10) |
alternative phone area code | smallint |
alternative phone number | integer |
alternative phone extension | char(10) |
work phone area code | smallint |
work phone number | integer |
work phone extension | char(10) |
fax area code | smallint |
fax number | bigint |
pager | bigint |
pager extension | char(10) |
e-mail | text |
url | text |
permition | boolean |
relocation | boolean |
travel | boolean |
highest grade | char(10) |
major_minor | text |
other skills | text |
reference number | bigint | default
nextval('reference'::text)
password | char(15) |
other | text |
supervisor | char(15) |
level | integer | default 7
aux1 | integer |
aux2 | integer |
aux3 | integer |
aux4 | integer |
aux5 | integer |
aux6 | date |
aux7 | time |

it also has index on oid;

the average select is:

SELECT CASE
WHEN "other skills"~*'[^a-zA-Z]visual[^a-zA-Z]'
then 1 else 0 end +
CASE WHEN "other skills"~*'[^a-zA-Z]basic[^a-zA-Z]'
then 1 else 0 end +
CASE WHEN "other skills"~*'[^a-zA-Z]web[^a-zA-Z]'
then 1 else 0
end as "count",
"first name", "last name",
"reference number",
"date of applying", "city", "state",
"e-mail", "home phone area code",
"home phone number",
"home phone extension" from "applicant"
WHERE "level">=2 AND
("other skills"~*'(^|[^a-zA-Z])unix($|[^a-zA-Z])'
and "other skills"~*'(^|[^a-zA-Z])oracle($|[^a-zA-Z])'
and "other skills"~*'(^|[^a-zA-Z])database($|[^a-zA-Z])')
order by "count" DESC
LIMIT 100;

this is the samle of our original query. We want to make it faster.
Number of criteria could be more than in this one. Our long field is
"other skills".

So can anybody advise us how we can improve performance of the search?

Thank you
vassili.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Britten 2000-06-23 14:39:04 RE: boolean isn't boolean?
Previous Message kaiq 2000-06-23 14:20:41 app-index in www.pgsql.com

Browse pgsql-sql by date

  From Date Subject
Next Message Mitch Vincent 2000-06-23 14:29:59 Orderby two different columns
Previous Message D'Arcy J.M. Cain 2000-06-23 11:40:25 Re: Merging two columns into one