Re: Full text indexing preformance! (long)

From: "Mitch Vincent" <mitch(at)huntsvilleal(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Full text indexing preformance! (long)
Date: 2000-05-29 19:57:20
Message-ID: 005e01bfc9a8$1a3f2da0$0300000a@doot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom, thanks for your reply..

I increased BLKSZ to 32k and re-compiled, then I imported all the resumes
(some of which I couldn't get before) and this problem completly
disappeared. The query is very fast now (.0.039792 seconds to be exact)..

One thing I did run into was this...
In my paging system I only have a need for 10 records at a time so I LIMIT
the query. The problem comes when I need to get a total of all the records
that matched the query (as a good search engine, I must tell people how many
records were found).. I can't count() and LIMIT in the same query, so I'm
forced to do 2 queries, one with count() and one without.

An example :

select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1,
a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3,
a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2, a.sells
from applicants as a,applicants_states as s, applicants_resumes as
ar,resumes_fti as rf where a.status = 'A' and lower(a.firstname) ~
lower('^a') and s.rstate='AL' and rf.string ~'^engineer' and
a.app_id=s.app_id and ar.app_id=a.app_id and rf.id=ar.oid limit 10 offset 0

Vs.

select count (a.app_id) as total from applicants as a,applicants_states as
s, applicants_resumes as ar,resumes_fti as rf where a.status = 'A' and
lower(a.firstname) ~ lower('^a') and s.rstate='AL' and rf.string
~'^engineer' and a.app_id=s.app_id and ar.app_id=a.app_id and rf.id=ar.oid

However the count() query has to go through the entire record set (which
makes sense) but it takes about 4 or 5 seconds.

The plan for the count() query.

NOTICE: QUERY PLAN:

Aggregate (cost=56.61..56.61 rows=1 width=20)
-> Nested Loop (cost=0.00..56.61 rows=1 width=20)
-> Nested Loop (cost=0.00..10.74 rows=1 width=16)
-> Nested Loop (cost=0.00..8.59 rows=1 width=12)
-> Index Scan using resumes_fti_index on resumes_fti rf
(cost=0.00..4.97 rows=1 width=4)
-> Index Scan using applicants_resumes_index on
applicants_resumes ar (cost=0.00..3.61 rows=1 width=8)
-> Index Scan using applicants_app_id on applicants a
(cost=0.00..2.14 rows=1 width=4)
-> Index Scan using applicants_states_app_id on applicants_states s
(cost=0.00..45.86 rows=1 width=4)

And the stats :

ProcessQuery
! system usage stats:
! 5.088647 elapsed 4.954981 user 0.125561 system sec
! [4.976752 user 0.132817 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/4607 [0/4846] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent
! 0/52 [3/57] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 0 read, 0 written, buffer hit rate
= 100.00%
! Local blocks: 0 read, 0 written, buffer hit rate
= 0.00%
! Direct blocks: 0 read, 0 written
CommitTransactionCommand
proc_exit(0)

The "0/4607 [0/4846] page faults/reclaims" area is greatly increased in this
query that from the other. Is that to be expected? Is there anything else I
can do to get the total number of records matched by the query and still use
LIMIT (I doubt it)?

If there isn't anything I can do, which looks to be the case here, I still
appreciate all the help you've given me..

I look forward to your response. Thanks!

-Mitch

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Olivier PRENANT 2000-05-29 20:45:22 Re: Timestamp data type problems
Previous Message Lamar Owen 2000-05-29 19:36:57 Header File cleanup.