FTI, paged, ranked searching and efficiency.

From: Paul <paul(at)operamail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: FTI, paged, ranked searching and efficiency.
Date: 2000-11-15 01:23:04
Message-ID: 3A36DC72@operamail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-php pgsql-sql

Hello,

This is going to be a bit long, I hope some of you will take the
trouble to read it :)

I am building a search engine for a section of a (PHP based) website.
I wish the user to be able to a number of words in the search, and the
search results to be ranked by the number of times words occur (both
different words and the same word occuring multiple times are good).

My (simplified) table structure is this:
======
Table "entry_fti"
Attribute | Type | Modifier
-----------+-------------+----------
string | varchar(25) |
id | oid |
Index: entry_fti_string_idx

Table "entry"
Attribute | Type | Modifier
-----------------------+-------+----------------------------------------------
-
entry_id |integer| not null default
nextval('entry_id_seq'::text)
entry_name |text |
entry_description_html|text |
entry_image_id |integer| not null default 0
entry_tn_image_id |integer| not null default 0
entry_live |boolean| not null default 't'
Index: entry_pkey

Table "image"
Attribute | Type | Modifier
------------+-------------+------------------------------------------------
image_id | integer | not null default nextval('image_id_seq'::text)
image_name | varchar(32) |
height | integer | not null
width | integer | not null
Indices: image_pkey
======

And my (simplified) query looks like this:
======
SELECT COUNT(entry_fti.id) AS rating,
entry.entry_name AS name,
entry.entry_id AS id,
entry.entry_description_html AS description_html,
image.image_name AS thumb1_name,
image.height AS thumb1_height,
image.width AS thumb1_width
FROM entry, entry_fti, image
WHERE entry_fti.id=entry.oid
AND entry.entrytn_image_id=image.image_id
AND entry.entry_live = 't'::bool
AND (
entry_fti.string ~'^word1'
OR
entry_fti.string ~'^word2'
OR
.
.
OR
entry_fti.string ~'^wordn'
)
GROUP BY entry.entry_id,
entry.entry_name,
entry.entry_description_html,
image.image_name,
image.height,
image.width
ORDER BY rating DESC
======

Now this all works, which is good. My problem now is that I want to
limit the number of results shown on a page to 20 and show the number
of pages of extra results, much like you'd see on any search engine site.
Naturally I immediatly thought of the LIMIT and OFFSET clauses, but then:
a) I'd need to do an extra query, to find out the total number of results
to show the number of pages on the webpage.
b) I have no idea how to write that query. It'd be a COUNT of 'rating'
in the above, which would be a COUNT(COUNT(entry_fti.id)) which
would probably require some hideous (and not legal?) double GROUP
BY construct. Ouch.

So basically, LIMIT/OFFSET looks like a no go. This leaves me with just
doing the above query, and using PHP to jump to a particular row in the
results depending on what page you are on and pg_numrows() to
calculate the number of pages.

Would that be particularly inefficient?
Should I be looking harder for a LIMIT/OFFSET based solution?

Perhaps I'd be better off splitting it into two queries, one to just
get the entry_id list in order, then another query to pull out the
rest of the information for the 20 of those entry_ids that are on the results
page I wish to show?
That would stop Postgres from gathering so much information that I am just
going to throw away anyway without looking at.

Any ideas? Have I missed something obvious that will help me? Or better yet,
can someone who has done this sort of thing before tell me whether I am on the
right track?

Paul

Responses

Browse pgsql-php by date

  From Date Subject
Next Message Josh Berkus 2000-11-15 17:03:22 Re: [SQL] FTI, paged, ranked searching and efficiency.
Previous Message Aguinaldo Fagundes Junior 2000-11-06 15:38:46 Select count

Browse pgsql-sql by date

  From Date Subject
Next Message Alvar Freude 2000-11-15 02:47:29 Re: Using Array-Values in subselect
Previous Message Tom Lane 2000-11-14 23:16:49 Re: Using Array-Values in subselect