Re: Queryplan within FTS/GIN index -search.

From: jesper(at)krogh(dot)cc
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Jesper Krogh" <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Queryplan within FTS/GIN index -search.
Date: 2009-10-23 07:45:44
Message-ID: d491705be3c4b2463b01fb8113e3e886.squirrel@shrek.krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
>> > In effect, what you want are words that aren't searched (or stored) in
>> > the index, but are included in the tsvector (so the RECHECK still
>> > works). That sounds like it would solve your problem and it would
>> reduce
>> > index size, improve update performance, etc. I don't know how
>> difficult
< > it would be to implement, but it sounds reasonable to me.
>
>> That sounds like it could require an index rebuild if the distribution
>> changes?
>
> My thought was that the common words could be declared to be common the
> same way stop words are. As long as words are only added to this list,
> it should be OK.
>
>> That would be another plan to pursue, but the MCV is allready there
>
> The problem with MCVs is that the index search can never eliminate
> documents because they don't contain a match, because it might contain a
> match that was previously an MCV, but is no longer.

No, it definately has to go visit the index/table to confirm findings, but
that why I wrote Queryplan in the subject line, because this os only about
the strategy to pursue to obtain the results. And a strategy about
limiting the amout of results as early as possible (as PG usually does)
would be what I'd expect and MCV can help it guess on that.

Similar finding, rewrite the query: (now i took the extreme and made
"raretem" a spellingerror), so result is 0.

ftstest=# explain analyze select body from ftsbody where ftsbody_body_fts
@@ to_tsquery('commonterm & spellerror') limit 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=132.63..188.89 rows=28 width=739) (actual
time=862.714..862.714 rows=0 loops=1)
-> Bitmap Heap Scan on ftsbody (cost=132.63..188.89 rows=28
width=739) (actual time=862.711..862.711 rows=0 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('commonterm &
spellerror'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..132.62
rows=28 width=0) (actual time=862.702..862.702 rows=0 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('commonterm &
spellerror'::text))
Total runtime: 862.771 ms
(6 rows)

ftstest=# explain analyze select body from ftsbody where ftsbody_body_fts
@@ to_tsquery('commonterm') and ftsbody_body_fts @@
to_tsquery('spellerror') limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=132.70..189.11 rows=28 width=739) (actual time=8.669..8.669
rows=0 loops=1)
-> Bitmap Heap Scan on ftsbody (cost=132.70..189.11 rows=28
width=739) (actual time=8.665..8.665 rows=0 loops=1)
Recheck Cond: ((ftsbody_body_fts @@
to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@
to_tsquery('spellerror'::text)))
-> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..132.70
rows=28 width=0) (actual time=8.658..8.658 rows=0 loops=1)
Index Cond: ((ftsbody_body_fts @@
to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@
to_tsquery('spellerror'::text)))
Total runtime: 8.724 ms
(6 rows)

So getting them with AND inbetween gives x100 better performance. All
queries are run on "hot disk" repeated 3-5 times and the number are from
the last run, so disk-read effects should be filtered away.

Shouldn't it somehow just do what it allready are capable of doing?

--
Jesper

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2009-10-23 08:26:26 Re: Queryplan within FTS/GIN index -search.
Previous Message Scara Maccai 2009-10-23 07:23:06 Re: Table Clustering & Time Range Queries