Re: Identifying which column matches a full text search

From: Richard Huxton <dev(at)archonet(dot)com>
To: Ryan Wallace <rywall(at)interchange(dot)ubc(dot)ca>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Identifying which column matches a full text search
Date: 2008-07-30 18:33:42
Message-ID: 4890B406.8080307@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Ryan Wallace wrote:
> Richard Huxton wrote:
>> Failing that, where I've had many (a dozen) different sources but want
>> to search them all I've built a textsearch_blocks table with columns to
>> identify the source and have triggers that keep it up to date.
>
> Once you've built the text search blocks table, how do you search it? Do you
> perform
> twelve separate queries or can you just do one?

OK, you have a table something like:

fulltext_blocks (
section varchar(32),
itemid int4,
words tsvector,them
PRIMARY KEY (section, itemid)
)

Now assume two of the things I search are "news" and "faqs". I'm
assuming they've both got a simple serial pkey - if not, "itemid" above
needs to be text and you'll have to cast.

For each target table (news, faqs) add a trigger that updates
fulltext_blocks appropriately. This can include weighting title and body
of a news article.

Then, search the fulltext_blocks table, optionally filtering by section.
If you're going to have lots of results put the ids into a (perhapd
temporary) results-table. Then join your results back to the original
tables with the appropriate UNION (if you need to - it might be you
fetch results one at a time elsewhere in your app).

SELECT n.id, n.title, n.body
FROM news n JOIN results r ON n.id=r.id
WHERE r.section='news'
UNION ALL
SELECT f.id, f,question, f.answer
FROM faqs f JOIN results r ON f.id=r.id
WHERE r.section='faqs'
;

You'll probably want to set ownership/permissions on the triggers /
fulltext_blocks table so you can't accidentally update it directly.

In mine I even had a "documents" section which relied on an external
cron-driven script to strip the first 32k of text out of uploaded
documents (pdf,word) in addition to user-supplied metadata (title, summary).

Note - this is basically simulating what we could do if you could index
a view. The fulltext_blocks table is nothing more than a materialised view.

HTH

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Marcin Stępnicki 2008-07-31 13:03:25 Function returning setof taking parameters from another table
Previous Message Ryan Wallace 2008-07-30 17:51:27 Re: Identifying which column matches a full text search