Re: [SQL] FTI, paged, ranked searching and efficiency.

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: sqllist <pgsql-sql(at)postgresql(dot)org>, phpp <pgsql-php(at)postgresql(dot)org>
Subject: Re: [SQL] FTI, paged, ranked searching and efficiency.
Date: 2000-11-15 17:03:22
Message-ID: 3A12C1DA.D296628C@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-php pgsql-sql

Paul,

I'm afraid that much of that was over my head. In fact, I'm keeping it
as an example in case I ever need to do something similar. Forward your
info and I'll include credit in the source :-)

In general terms, I've always depended on the PHP to select a "page" of
results, using the logic that the number of results on a page is a
matter for the web application to handle (a display issue) rather than
something to be handled on the back-end (a data issue).

However, you point about not pulling out tons of data the user will
never examine (i.e. 2nd and succeeding pages) is well-taken. Although,
without pulling the entire data set, you can't display to the user how
many results there are, total. If it's a strong possibility that the
users are really only ever going to want the top 20-40 rated results,
then splitting it as you suggest ... first, counting all the matches and
then dragging in the rest of the data for the top X records ... makes a
lot of sense.

Unfortunately, your only real option I can see for DB server-side row
grabbing is: Create the query(ies) as a temporary table or view using a
function. Then use Limit and Offset to grab one chunk of data at a
time. This is, of course, a serious problem for mutli-user performance
since eash user would need their own temp table or view.

From what I can tell, search engines (google, for example) grab the
whole recordset and use the web script to parse it out 25 records at a
time.

Hopefully, someone on this list will have done that before and can
provide less theoretical advice.

-Josh Berkus

P.S. I've also posted this to the pgsql-php list. I;ve quoted the full
text of your question below my .sig for that reason.

--
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 436-9166
for law firms, small businesses fax 436-0137
and non-profit organizations. pager 338-4078
San Francisco

Paul wrote:
>
> Hello,
>
> This is going to be a bit long, I hope some of you will take the
> trouble to read it :)
>
> I am building a search engine for a section of a (PHP based) website.
> I wish the user to be able to a number of words in the search, and the
> search results to be ranked by the number of times words occur (both
> different words and the same word occuring multiple times are good).
>
> My (simplified) table structure is this:
> ======
> Table "entry_fti"
> Attribute | Type | Modifier
> -----------+-------------+----------
> string | varchar(25) |
> id | oid |
> Index: entry_fti_string_idx
>
> Table "entry"
> Attribute | Type | Modifier
> -----------------------+-------+----------------------------------------------
> -
> entry_id |integer| not null default
> nextval('entry_id_seq'::text)
> entry_name |text |
> entry_description_html|text |
> entry_image_id |integer| not null default 0
> entry_tn_image_id |integer| not null default 0
> entry_live |boolean| not null default 't'
> Index: entry_pkey
>
> Table "image"
> Attribute | Type | Modifier
> ------------+-------------+------------------------------------------------
> image_id | integer | not null default nextval('image_id_seq'::text)
> image_name | varchar(32) |
> height | integer | not null
> width | integer | not null
> Indices: image_pkey
> ======
>
> And my (simplified) query looks like this:
> ======
> SELECT COUNT(entry_fti.id) AS rating,
> entry.entry_name AS name,
> entry.entry_id AS id,
> entry.entry_description_html AS description_html,
> image.image_name AS thumb1_name,
> image.height AS thumb1_height,
> image.width AS thumb1_width
> FROM entry, entry_fti, image
> WHERE entry_fti.id=entry.oid
> AND entry.entrytn_image_id=image.image_id
> AND entry.entry_live = 't'::bool
> AND (
> entry_fti.string ~'^word1'
> OR
> entry_fti.string ~'^word2'
> OR
> .
> .
> OR
> entry_fti.string ~'^wordn'
> )
> GROUP BY entry.entry_id,
> entry.entry_name,
> entry.entry_description_html,
> image.image_name,
> image.height,
> image.width
> ORDER BY rating DESC
> ======
>
> Now this all works, which is good. My problem now is that I want to
> limit the number of results shown on a page to 20 and show the number
> of pages of extra results, much like you'd see on any search engine site.
> Naturally I immediatly thought of the LIMIT and OFFSET clauses, but then:
> a) I'd need to do an extra query, to find out the total number of results
> to show the number of pages on the webpage.
> b) I have no idea how to write that query. It'd be a COUNT of 'rating'
> in the above, which would be a COUNT(COUNT(entry_fti.id)) which
> would probably require some hideous (and not legal?) double GROUP
> BY construct. Ouch.
>
> So basically, LIMIT/OFFSET looks like a no go. This leaves me with just
> doing the above query, and using PHP to jump to a particular row in the
> results depending on what page you are on and pg_numrows() to
> calculate the number of pages.
>
> Would that be particularly inefficient?
> Should I be looking harder for a LIMIT/OFFSET based solution?
>
> Perhaps I'd be better off splitting it into two queries, one to just
> get the entry_id list in order, then another query to pull out the
> rest of the information for the 20 of those entry_ids that are on the results
> page I wish to show?
> That would stop Postgres from gathering so much information that I am just
> going to throw away anyway without looking at.
>
> Any ideas? Have I missed something obvious that will help me? Or better yet,
> can someone who has done this sort of thing before tell me whether I am on the
> right track?
>
> Paul

In response to

Responses

Browse pgsql-php by date

  From Date Subject
Next Message Stephen van Egmond 2000-11-15 17:25:16 Re: Re: [SQL] FTI, paged, ranked searching and efficiency.
Previous Message Paul 2000-11-15 01:23:04 FTI, paged, ranked searching and efficiency.

Browse pgsql-sql by date

  From Date Subject
Next Message Stephen van Egmond 2000-11-15 17:25:16 Re: Re: [SQL] FTI, paged, ranked searching and efficiency.
Previous Message Patrik Kudo 2000-11-15 16:21:16 Index problem