Skip site navigation (1) Skip section navigation (2)

Re: Queryplan within FTS/GIN index -search.

From: jesper(at)krogh(dot)cc
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Queryplan within FTS/GIN index -search.
Date: 2009-10-23 09:04:22
Message-ID: 6bc6ee9ee43ffe63da2753dad40bd549.squirrel@shrek.krogh.cc (view raw or flat)
Thread:
Lists: pgsql-performance
> jesper(at)krogh(dot)cc wrote:
>>
>> So getting them with AND inbetween gives x100 better performance. All
>> queries are run on "hot disk" repeated 3-5 times and the number are from
>> the last run, so disk-read effects should be filtered away.
>>
>> Shouldn't it somehow just do what it allready are capable of doing?
>
> I'm guessing to_tsquery(...) will produce a tree of search terms (since
> it allows for quite complex expressions). Presumably there's a standard
> order it gets processed in too, so it should be possible to generate a
> more or less efficient ordering.
>
> That structure isn't exposed to the planner though, so it doesn't
> benefit from any re-ordering the planner would normally do for normal
> (exposed) AND/OR clauses.
>
> Now, to_tsquery() can't re-order the search terms because it doesn't
> know what column it's being compared against. In fact, it might not be a
> simple column at all.

I cant follow this logic based on explain output, but I may have
misunderstood something. The only difference in these two query-plans is
that we have an additional or'd term in the to_tsquery().

What we see is that, the query-planner indeed has knowledge about changes
in the row estimates based on changes in the query to to_tsquery(). My
guess is that it is because to_tsquery actually parses the query and give
the estimates, now how can to_tsquery give estimates without having access
to the statistics for the column?

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare');
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on ftsbody  (cost=132.64..190.91 rows=29 width=4)
   Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text))
   ->  Bitmap Index Scan on ftsbody_tfs_idx  (cost=0.00..132.63 rows=29
width=0)
         Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text))
(4 rows)

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare | morerare');
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftsbody  (cost=164.86..279.26 rows=57 width=4)
   Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare |
morerare'::text))
   ->  Bitmap Index Scan on ftsbody_tfs_idx  (cost=0.00..164.84 rows=57
width=0)
         Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare |
morerare'::text))
(4 rows)

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare | reallycommon');
                                QUERY PLAN
--------------------------------------------------------------------------
 Seq Scan on ftsbody  (cost=0.00..1023249.39 rows=5509293 width=4)
   Filter: (ftsbody_body_fts @@ to_tsquery('reallyrare |
reallycommon'::text))
(2 rows)


> 2. A variant to_tsquery_with_sorting() which would take the column-name
> or something and look up the stats to work against.

Does above not seem like its there allready?

(sorry.. looking at C-code from my point of view would set me a couple of
weeks back, so I have troble getting closer to the answer than
interpreting the output and guessing the rest).

-- 
Jesper


In response to

Responses

pgsql-performance by date

Next:From: Richard HuxtonDate: 2009-10-23 14:30:56
Subject: Re: Queryplan within FTS/GIN index -search.
Previous:From: Richard HuxtonDate: 2009-10-23 08:26:26
Subject: Re: Queryplan within FTS/GIN index -search.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group