Full text indexing (and errors!)

From: "Mitch Vincent" <mitch(at)venux(dot)net>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Full text indexing (and errors!)
Date: 2000-05-21 16:59:14
Message-ID: 00db01bfc345$e5400100$0300000a@doot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I posted this to -hackers but thought it might be of some use to people here
as well. I look forward to any and all comments.. Thanks!

----------------------------------

I finally finished the fulltextindex ( in contrib/fulltextindex) and am
wondering if I might be able to do something that I haven't already to make
these queries a bit faster.. There is a lot of data here, I know, but as it
is right now it is faster for me to use grep (or agrep) to search the files
and put the matching files into a temporay table and qualify results on
that... I had hoped this would be the case :-)

Lets see.. (Most of this is in contrib/fulltextindex but I'll repeat it
here)

I have 2 tables.

Table "applicants_resumes"
Attribute | Type | Modifier
-------------+---------+----------
app_id | integer |
resume_text | text |
Index: resumes_oid_index

And

Table "resumes_fti"
Attribute | Type | Modifier
-----------+--------------+----------
string | varchar(255) |
id | oid |
Index: resume_fti_index

The resumes_fti table holds the broken up chunks of text (duh) ..

I also did the CLUSTER resumes_fti_index on resumes_fti

When I run a query I get :

ipa=# explain select c.* from applicants c, resumes_fti f1 where f1.string ~
'^engineer' and f1.id = c.oid;
NOTICE: QUERY PLAN:

Hash Join (cost=918.11..7969.43 rows=56443 width=607)
-> Seq Scan on applicants c (cost=0.00..1331.56 rows=9856 width=603)
-> Hash (cost=5.00..5.00 rows=168041 width=4)
-> Index Scan using resume_fti_index on resumes_fti f1
(cost=0.00..5.00 rows=168041 width=4)

EXPLAIN
ipa=#

query: select c.* from applicants c, resumes_fti f1 where f1.string ~
'^engineer' and f1.id = c.oid;
ProcessQuery
! system usage stats:
! 304.304711 elapsed 9.111341 user 7.992034 system sec
! [9.172874 user 8.012545 sys total]
! 53309/1599 [53411/1599] filesystem blocks in/out
! 0/295 [0/605] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [4/5] messages rcvd/sent
! 53168/226 [53255/232] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 58171 read, 0 written, buffer hit rate
= 14.19%
! Local blocks: 0 read, 0 written, buffer hit rate
= 0.00%
! Direct blocks: 0 read, 0 written
CommitTransactionCommand
proc_exit(0)
shmem_exit(0)
exit(0)

... As you can see, really really slow. So I thought baout creating an index
on 'string' in resumes_fti (makes since as this is suppose to be a full text
index) -- Note that's not listed in the README which was very odd to me. I
don't understand the point in breaking all the text out into chunks if you
don't then index it. The way it is after you get done with the README is
just a broken up version of what you already had (a bunch of text fields).
It was my understanding that the fulltextindex was meant to get around the
inability to index a full text field by breaking the full text field up into
smaller varchar() fields which could them be indexed...

Anyway, I tried --

ipa=# CREATE INDEX "rstring_fti_index" on resumes_fti(lower(string));
FATAL 1: Memory exhausted in AllocSetAlloc()
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
ipa=#

... and as you can see it messed up pretty badly after about 15 minutes.
This machine is a PII400 with 256 Megs of RAM and 520 megs of swap running
PostgreSQL 7.0..

Sorry about the length of this post and thanks for any
pointers/comments/ideas..

-Mitch Vincent

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2000-05-21 17:12:34 Re: Foreign keys breaks tables permissions
Previous Message Peter Eisentraut 2000-05-21 16:45:20 Re: Foreign keys breaks tables permissions