Re: Full Text Search ideas

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Howard Rogers <hjr(at)diznix(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Full Text Search ideas
Date: 2010-07-18 20:16:09
Message-ID: 19819.1279484169@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Howard Rogers <hjr(at)diznix(dot)com> writes:
> ims=# select count(*) from search_rm
> where to_tsvector('english', textsearch) @@ to_tsquery('english','bat & sb12n');
> count
> -------
> 3849
> (1 row)

> Time: 408.962 ms

> ims=# select count(*) from search_rm
> where to_tsvector('english', textsearch) @@ to_tsquery('english','bat & !sb12y');
> count
> -------
> 3849
> (1 row)

> Time: 11.533 ms

Yeah, I imagine that the first of these will involve examining all the
index entries for sb12n. There's not a lot of smarts about that inside
the GIN index machinery, AFAIK: it'll just fetch all the relevant TIDs
for both terms and then AND them.

> I'm wondering firstly if there's any way I can configure PostgreSQL
> FTS so that it produces the sort of results we've gotten used to from
> Oracle, i.e., where search speeds do not go up wildly when a 'search
> term' is applied that happens to be used by the vast majority of
> document records.

If you're willing to split out the search terms that are like this,
you could probably get better results with something like

select count(*) from search_rm
where to_tsvector('english', textsearch) @@ to_tsquery('english','bat') AND
to_tsvector('english', textsearch) @@ to_tsquery('english','sb12n');

That will put it on the optimizer's head as to whether to use the index
for one term or both terms.

It might be worth noting that the optimizer will of course not get this
right unless it has decent statistics about both search terms --- and
there is an as-yet-unreleased patch about tsvector stats gathering:
http://archives.postgresql.org/pgsql-committers/2010-05/msg00360.php
I am not sure that the situation addressed by that patch applies in
your case, but it might.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andre Lopes 2010-07-18 20:24:14 Re: How to import *.sql file to postgresql database
Previous Message John Gage 2010-07-18 20:13:28 Differences between Postgres and MySql