Full text indexing preformance! (long)

From: "Mitch Vincent" <mitch(at)venux(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Full text indexing preformance! (long)
Date: 2000-05-28 18:42:30
Message-ID: 008e01bfc8d4$7b75c180$0300000a@doot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Well I have to say that I'm pretty impressed with PostgreSQL after this..
Still, I'd like some input from the experts as I may not be doing the best I
can..

I setup the full text indexing (as described in contrib/fulltextindex) and
see some amazing results but like I said earlier, I might be able to do
better.

I took Tom's advice and turned on like planning (as described in
contrib/likeplanning) and it made a world of difference by itself..

Here is a quick run-down of the table structure.

Table : resumes_fti (coorespnds to the cds-fti table in the example)

25370953 rows.

Table applicants_resumes (cooresponds to the cds table in the example)

Table applicants (63 fields)
11039 rows.

Table applicants_states (2 fields)
276255 rows

The most complex query I use is this :

select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1,
a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3,
a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2,
a.sells from applicants as a,applicants_states as s, applicants_resumes as
ar,
resumes_fti as rf where a.status = 'A' and s.rstate='AL' and
s.app_id=a.app_id and rf.string ~'engineer' and rf.id = ar.oid limit 10
offset 0

-- BUT - I forgot one crucial thing. To qualify the results from the
applicants_resume table bases on the applicants table (ie ar.app_id =
a.app_id) I did this and the query went from just over 3 seconds to over 25
seconds!

I changes the above query to

select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1,
a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3,
a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2,
a.sells from applicants as a,applicants_states as s, applicants_resumes as
ar,
resumes_fti as rf where a.status = 'A' and s.rstate='AL' and rf.string
~'engineer'
and rf.id = ar.oid and s.app_id=a.app_id and ar.app_id=a.app_id limit 10
offset 0
(Listed below again)

Hopefully it's just something else stupid I am doing and someone will beat
me with a clue stick. All of this was done on a PostgreSQL 7.0 backend run
as "/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data -B 4096 -o '-S
16384'
-i >/usr/local/pgsql/postgres.log 2>&1&" on a FreeBSD 4.0, Dual Celeron 600
box with an ATA/66 30 gig drive and 256 megs of RAM.

Here are some stats :

Without the extra condition :

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..487951.02 rows=1644 width=204)
-> Nested Loop (cost=0.00..478489.01 rows=2641 width=12)
-> Nested Loop (cost=0.00..474081.63 rows=1 width=8)
-> Seq Scan on resumes_fti rf (cost=0.00..474076.91 rows=1
width=4)
-> Index Scan using resumes_oid_index on applicants_resumes
ar (cost=0.00..4.70 rows=1 width=4)
-> Index Scan using applicants_states_rstate on applicants_states s
(cost=0.00..4380.98 rows=2641 width=4)
-> Index Scan using applicants_app_id on applicants a (cost=0.00..3.57
rows=1 width=192)

EXPLAIN

StartTransactionCommand
query: select a.appcode, a.firstname, a.middlename, a.lastname, a.state,
a.degree1,
a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3,
a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2,
a.sells from applicants as a,applicants_states as s, applicants_resumes as
ar,
resumes_fti as rf where a.status = 'A' and s.rstate='AL' and rf.string
~'engineer'
and rf.id = ar.oid and s.app_id=a.app_id limit 10 offset 0;
ProcessQuery
! system usage stats:
! 3.386697 elapsed 2.599174 user 0.787057 system sec
! [2.617929 user 0.797100 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/8330 [0/8569] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent
! 0/43 [3/47] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 4950 read, 0 written, buffer hit rate
= 20.03%
! Local blocks: 0 read, 0 written, buffer hit rate
= 0.00%
! Direct blocks: 0 read, 0 written
CommitTransactionCommand
proc_exit(0)
shmem_exit(0)
exit(0)

With the extra condition :

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..474194.76 rows=1 width=208)
-> Nested Loop (cost=0.00..474085.21 rows=1 width=204)
-> Nested Loop (cost=0.00..474081.63 rows=1 width=12)
-> Seq Scan on resumes_fti rf (cost=0.00..474076.91 rows=1
width=4)
-> Index Scan using resumes_oid_index on applicants_resumes
ar (cost=0.00..4.70 rows=1 width=8)
-> Index Scan using applicants_app_id on applicants a
(cost=0.00..3.57 rows=1 width=192)
-> Index Scan using applicants_states_app_id on applicants_states s
(cost=0.00..109.54 rows=1 width=4)

EXPLAIN

StartTransactionCommand
query: select a.appcode, a.firstname, a.middlename, a.lastname, a.state,
a.degree1,
a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3,
a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2,
a.sells from applicants as a,applicants_states as s, applicants_resumes as
ar,
resumes_fti as rf where a.status = 'A' and s.rstate='AL' and rf.string
~'engineer'
and rf.id = ar.oid and s.app_id=a.app_id and ar.app_id=a.app_id limit 10
offset 0
ProcessQuery
! system usage stats:
! 25.503341 elapsed 18.564543 user 5.599631 system sec
! [18.564543 user 5.627987 sys total]
! 2029/0 [2029/0] filesystem blocks in/out
! 0/8335 [0/8571] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent
! 149/342 [152/346] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 35118 read, 0 written, buffer hit rate
= 4.98%
! Local blocks: 0 read, 0 written, buffer hit rate
= 0.00%
! Direct blocks: 0 read, 0 written
CommitTransactionCommand
proc_exit(0)
shmem_exit(0)
exit(0)

Sorry about the length. Thanks!

-Mitch

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-05-28 18:42:46 Re: Proposed cleanup of generated header files
Previous Message Tom Lane 2000-05-28 18:23:35 Proposed cleanup of generated header files