GIN internal query-plan alternatives.

From: jesper(at)krogh(dot)cc
To: pgsql-hackers(at)postgresql(dot)org
Subject: GIN internal query-plan alternatives.
Date: 2011-11-30 11:38:20
Message-ID: 2f73f1348320af71b0551e2934d588ca.squirrel@shrek.krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

I have a "feeling" that the internal query-plans (or alternative)
query-plans when executing GIN-searches are not being exhausted as much as
they generally are in PG.

More specifically a query like:

select id from table where fts @@ to_tsquery('english','verycommon &
veryrare');

Can do an search on the "very rare" and postfilter it on the
very-common keyword. I had problems trying to force the query-planner into
executing a query that forced that behavior, but here is my attempt:

2011-11-30 11:33:41.010 testdb=# explain analyze select id from
testdb.testtable where id in (select id from testdb.testtable where fts @@
pptsquery('veryrare')) and fts @@ pptsquery('verycommon') order by id desc
limit 300;
QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=14203.33..14204.08 rows=300 width=4) (actual
time=62.561..62.567 rows=14 loops=1)
-> Sort (cost=14203.33..14206.91 rows=1430 width=4) (actual
time=62.561..62.566 rows=14 loops=1)
Sort Key: testdb.testtable.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=4799.17..14137.35 rows=1430 width=4)
(actual time=11.225..62.530 rows=14 loops=1)
-> HashAggregate (cost=4799.17..4813.47 rows=1430
width=4) (actual time=6.792..7.941 rows=2409 loops=1)
-> Bitmap Heap Scan on testtable
(cost=1943.29..4795.59 rows=1430 width=4) (actual
time=0.962..5.174 rows=2409 loops=1)
Recheck Cond: (fts @@ '''veryrare'''::tsquery)
-> Bitmap Index Scan on testtable_gin_idx
(cost=0.00..1942.93 rows=1430 width=0) (actual
time=0.635..0.635 rows=2419 loops=1)
Index Cond: (fts @@
'''veryrare'''::tsquery)
-> Index Scan using testtable_pkey on testtable
(cost=0.00..6.51 rows=1 width=4) (actual time=0.022..0.022
rows=0 loops=2409)
Index Cond: (testdb.testtable.id =
testdb.testtable.id)
Filter: (testdb.testtable.fts @@
'''verycommon'''::tsquery)
Total runtime: 62.679 ms
(14 rows)

Time: 125.899 ms
2011-11-30 11:40:59.673 testdb=# explain analyze select id from
testdb.testtable where fts @@ pptsquery('verycommon veryrare') order by id
desc limit 300;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2522.12..2522.87 rows=300 width=4) (actual
time=1282.967..1282.972 rows=14 loops=1)
-> Sort (cost=2522.12..2523.88 rows=704 width=4) (actual
time=1282.965..1282.968 rows=14 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on testtable (cost=1081.67..2489.63
rows=704 width=4) (actual time=1282.902..1282.948 rows=14
loops=1)
Recheck Cond: (fts @@ '''verycommon'' &
''veryrare'''::tsquery)
-> Bitmap Index Scan on testtable_gin_idx
(cost=0.00..1081.49 rows=704 width=0) (actual
time=1282.880..1282.880 rows=17 loops=1)
Index Cond: (fts @@ '''verycommon'' &
''veryrare'''::tsquery)
Total runtime: 1283.274 ms
(9 rows)

Time: 1300.587 ms
2011-11-30 11:41:13.217 testdb=#

This may of-course not always be the optimal query-plan, but in this
situation the alternative plan is roughly 20-times better.
(both queries cached).

The in-clause is not "the best way", a regular filter on the results of
the veryrare-term would be natural, but the query-planner cleverly
collapses that to be the exact same thing.

This can be stressed by just adding the same "common" keyword several time
to the GIN-search, where the query, even producing the same results gets
slower and slower:

2011-11-30 11:51:10.239 testdb=# select count(id) from testdb.testtable
where id in (select id from testdb.testtable where fts @@
pptsquery('veryrare')) and fts @@ pptsquery('verycommon verycommon
verycommon');
count
-------
14
(1 row)

Time: 90.389 ms
2011-11-30 11:51:16.777 testdb=# select count(id) from testdb.testtable
where fts @@ to_tsquery('english','veryrare & verycommon & verycommon &
verycommon');
count
-------
14
(1 row)

Time: 2207.125 ms

PG does a lot of query-rewriting and testing different query-plans, there
seems to be room for improvements here. I'll craft a test-dataset
where it can be reproduced. Then someone with internal PG knowledge can
tell me if it just isn't implemented or I might have some configuration
option that prevents my system from doing it correctly.

--
Jesper

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2011-11-30 12:07:01 Java LISTEN/NOTIFY client library work-around
Previous Message Greg Smith 2011-11-30 08:02:51 Re: Word-smithing doc changes