Queryplan within FTS/GIN index -search.

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-performance(at)postgresql(dot)org
Subject: Queryplan within FTS/GIN index -search.
Date: 2009-10-22 16:28:13
Message-ID: 4AE0881D.9070201@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

My indexing base is now up to 7.5m documents, I have raise statistics
target to 1000 for the tsvector column in order to make the
query-planner choose more correctly. That works excellent.

Table structure is still:
ftstest=# \d ftsbody
Table "public.ftsbody"
Column | Type | Modifiers

------------------+----------+------------------------------------------------------
id | integer | not null default
nextval('ftsbody_id_seq'::regclass)
body | text | not null default ''::text
ftsbody_body_fts | tsvector |
Indexes:
"ftsbody_body_md5" UNIQUE, btree (md5(body))
"ftsbody_id_pri_idx" UNIQUE, btree (id)
"ftsbody_tfs_idx" gin (ftsbody_body_fts)
Triggers:
tsvectorupdate BEFORE INSERT OR UPDATE ON uniprot FOR EACH ROW
EXECUTE PROCEDURE tsvector_update_trigger('ftsbody_body_fts',
'pg_catalog.english', 'body')

I'm searching the gin-index for 1-5 terms, where all of them matches the
same document. TERM1 is unique by itself, TERM2 is a bit more common (52
rows), TERM3 more common, TERM4 close to all and TERM5 all records.

Just quering for a unique value and add in several values that match
everything makes the run-time go significantly up.

I somehow would expect the index-search to take advantage of the MCV's
informations in the statistics that sort of translate it into a search
and post-filtering (as PG's queryplanner usually does at the SQL-level).

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual time=3.726..3.729
rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=3.722..3.723 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=3.700..3.702 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=3.683..3.683 rows=1 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2'::text))
Total runtime: 3.790 ms
(9 rows)

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual
time=850.017..850.020 rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=850.013..850.015 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=849.991..849.993 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=849.970..849.970 rows=1
loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3'::text))
Total runtime: 850.084 ms
(9 rows)

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual
time=1152.065..1152.068 rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=1152.061..1152.062 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=1152.039..1152.041 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3 & TERM4'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=1152.020..1152.020
rows=1 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3 & TERM4'::text))
Total runtime: 1152.129 ms
(9 rows)

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual
time=1509.043..1509.046 rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=1509.040..1509.040 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=1509.018..1509.020 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3 & TERM4 & TERM5'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=1508.998..1508.998
rows=1 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3 & TERM4 & TERM5'::text))
Total runtime: 1509.109 ms
(9 rows)

Can (perhaps more readable) be found at http://krogh.cc/~jesper/test.out

Can this be optimized? (I cannot really prevent users from typing stuff
in that are common).

--
Jesper

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Buckham 2009-10-22 18:50:44 Table Clustering & Time Range Queries
Previous Message Robert Haas 2009-10-22 13:26:56 Re: optimizing query with multiple aggregates