Re: Queryplan within FTS/GIN index -search.

From: jesper(at)krogh(dot)cc
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Queryplan within FTS/GIN index -search.
Date: 2009-10-23 09:04:22
Message-ID: 6bc6ee9ee43ffe63da2753dad40bd549.squirrel@shrek.krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> jesper(at)krogh(dot)cc wrote:
>>
>> So getting them with AND inbetween gives x100 better performance. All
>> queries are run on "hot disk" repeated 3-5 times and the number are from
>> the last run, so disk-read effects should be filtered away.
>>
>> Shouldn't it somehow just do what it allready are capable of doing?
>
> I'm guessing to_tsquery(...) will produce a tree of search terms (since
> it allows for quite complex expressions). Presumably there's a standard
> order it gets processed in too, so it should be possible to generate a
> more or less efficient ordering.
>
> That structure isn't exposed to the planner though, so it doesn't
> benefit from any re-ordering the planner would normally do for normal
> (exposed) AND/OR clauses.
>
> Now, to_tsquery() can't re-order the search terms because it doesn't
> know what column it's being compared against. In fact, it might not be a
> simple column at all.

I cant follow this logic based on explain output, but I may have
misunderstood something. The only difference in these two query-plans is
that we have an additional or'd term in the to_tsquery().

What we see is that, the query-planner indeed has knowledge about changes
in the row estimates based on changes in the query to to_tsquery(). My
guess is that it is because to_tsquery actually parses the query and give
the estimates, now how can to_tsquery give estimates without having access
to the statistics for the column?

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare');
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on ftsbody (cost=132.64..190.91 rows=29 width=4)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..132.63 rows=29
width=0)
Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text))
(4 rows)

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare | morerare');
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on ftsbody (cost=164.86..279.26 rows=57 width=4)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare |
morerare'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..164.84 rows=57
width=0)
Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare |
morerare'::text))
(4 rows)

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare | reallycommon');
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on ftsbody (cost=0.00..1023249.39 rows=5509293 width=4)
Filter: (ftsbody_body_fts @@ to_tsquery('reallyrare |
reallycommon'::text))
(2 rows)

> 2. A variant to_tsquery_with_sorting() which would take the column-name
> or something and look up the stats to work against.

Does above not seem like its there allready?

(sorry.. looking at C-code from my point of view would set me a couple of
weeks back, so I have troble getting closer to the answer than
interpreting the output and guessing the rest).

--
Jesper

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2009-10-23 14:30:56 Re: Queryplan within FTS/GIN index -search.
Previous Message Richard Huxton 2009-10-23 08:26:26 Re: Queryplan within FTS/GIN index -search.