Skip site navigation (1) Skip section navigation (2)

Re: GiST index performance

From: dforum <dforums(at)vieonet(dot)com>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: GiST index performance
Date: 2009-04-16 17:19:18
Message-ID: 49E76896.7080001@vieonet.com (view raw or flat)
Thread:
Lists: pgsql-performance
hello,

there is other performance problem on this request.

If you analyse query plan, you see that most of the time are lost during 
sequencial scan, and you have 2 seq scan.

You have to create other indexes to match the request.

Postgresq is totally dependant on index to reach is performance.

Regarding gist or btree, I personnaly had better performance with btree.

Regards

david

Matthew Wakeling a écrit :
>
> I have been doing some queries that are best answered with GiST 
> indexes, however I have found that their performance is a little 
> lacking. I thought I would do a direct comparison on a level playing 
> field. Here are two EXPLAIN ANALYSE results for the same query, with 
> two different indexes. The two indexes are identical except that one 
> is btree and the other GiST.
>
> Here is the query:
>
> SELECT *
> FROM
>     location l1,
>     location l2,
>     gene,
>     primer
> WHERE
>         l1.subjectid <> l2.subjectid
>     AND l1.objectid = l2.objectid
>     AND l1.subjectid = gene.id
>     AND l2.subjectid = primer.id
>     AND l2.intermine_start <= l1.intermine_start
>     AND l2.intermine_end >= l1.intermine_start
>
> Here is the btree index:
>
> CREATE INDEX location_object_start ON location (objectid, 
> intermine_start);
>
> QUERY PLAN
> ----------------------------------------------------------------------
>  Hash Join
>    (cost=26213.16..135980894.76 rows=3155740824 width=484)
>    (actual time=2799.260..14256.588 rows=2758 loops=1)
>    Hash Cond: (l1.subjectid = gene.id)
>    ->  Nested Loop
>          (cost=0.00..4364485.01 rows=8891802645 width=324)
>          (actual time=9.748..10418.807 rows=390695 loops=1)
>          Join Filter: (l1.subjectid <> l2.subjectid)
>          ->  Nested Loop
>                (cost=0.00..446862.58 rows=572239 width=259)
>                (actual time=9.720..4226.117 rows=211880 loops=1)
>                ->  Seq Scan on primer
>                      (cost=0.00..15358.80 rows=211880 width=194)
>                      (actual time=9.678..579.877 rows=211880 loops=1)
>                ->  Index Scan using location__key_all on location l2
>                      (cost=0.00..2.00 rows=3 width=65)
>                      (actual time=0.004..0.007 rows=1 loops=211880)
>                      Index Cond: (l2.subjectid = primer.id)
>          ->  Index Scan using location_object_start on location l1
>                (cost=0.00..3.85 rows=150 width=65)
>                (actual time=0.005..0.012 rows=3 loops=211880)
>                Index Cond: ((l1.objectid = l2.objectid) AND 
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= 
> l1.intermine_start))
>    ->  Hash
>          (cost=20496.96..20496.96 rows=457296 width=160)
>          (actual time=2788.698..2788.698 rows=457296 loops=1)
>          ->  Seq Scan on gene
>                (cost=0.00..20496.96 rows=457296 width=160)
>                (actual time=0.038..1420.604 rows=457296 loops=1)
>  Total runtime: 14263.846 ms
> (13 rows)
>
>
> Here is the GiST index:
>
> CREATE INDEX location_object_start_gist ON location USING gist 
> (objectid, intermine_start);
>
> QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join
>    (cost=26213.16..136159960.32 rows=3155740824 width=484)
>    (actual time=2576.109..2300486.267 rows=2758 loops=1)
>    Hash Cond: (l1.subjectid = gene.id)
>    ->  Nested Loop
>          (cost=0.00..4543550.56 rows=8891802645 width=324)
>          (actual time=366.121..2296668.740 rows=390695 loops=1)
>          Join Filter: (l1.subjectid <> l2.subjectid)
>          ->  Nested Loop
>                (cost=0.00..446862.58 rows=572239 width=259)
>                (actual time=362.774..13423.443 rows=211880 loops=1)
>                ->  Seq Scan on primer
>                      (cost=0.00..15358.80 rows=211880 width=194)
>                      (actual time=319.559..1296.907 rows=211880 loops=1)
>                ->  Index Scan using location__key_all on location l2
>                      (cost=0.00..2.00 rows=3 width=65)
>                      (actual time=0.041..0.045 rows=1 loops=211880)
>                      Index Cond: (l2.subjectid = primer.id)
>          ->  Index Scan using location_object_start_gist on location l1
>                (cost=0.00..4.16 rows=150 width=65)
>                (actual time=3.354..10.757 rows=3 loops=211880)
>                Index Cond: ((l1.objectid = l2.objectid) AND 
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= 
> l1.intermine_start))
>    ->  Hash
>          (cost=20496.96..20496.96 rows=457296 width=160)
>          (actual time=2157.914..2157.914 rows=457296 loops=1)
>          ->  Seq Scan on gene
>                (cost=0.00..20496.96 rows=457296 width=160)
>                (actual time=3.904..1206.907 rows=457296 loops=1)
>  Total runtime: 2300510.674 ms
> (13 rows)
>
> The query plans are identical except in the type of index used, but 
> there is a factor of a few hundred in execute time. Is this the kind 
> of factor that would be expected, or is there something amiss? Is this 
> seen as something that might be improved in the future?
>
> Matthew
>


In response to

Responses

pgsql-performance by date

Next:From: Matthew WakelingDate: 2009-04-16 17:23:49
Subject: Re: GiST index performance
Previous:From: Tom LaneDate: 2009-04-16 16:46:46
Subject: Re: GiST index performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group