Quick Links

Re: GiST index performance

From:	dforum <dforums(at)vieonet(dot)com>
To:	Matthew Wakeling <matthew(at)flymine(dot)org>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: GiST index performance
Date:	2009-04-16 17:19:18
Message-ID:	49E76896.7080001@vieonet.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

hello,

there is other performance problem on this request.

If you analyse query plan, you see that most of the time are lost during
sequencial scan, and you have 2 seq scan.

You have to create other indexes to match the request.

Postgresq is totally dependant on index to reach is performance.

Regarding gist or btree, I personnaly had better performance with btree.

Regards

david

Matthew Wakeling a écrit :
>
> I have been doing some queries that are best answered with GiST
> indexes, however I have found that their performance is a little
> lacking. I thought I would do a direct comparison on a level playing
> field. Here are two EXPLAIN ANALYSE results for the same query, with
> two different indexes. The two indexes are identical except that one
> is btree and the other GiST.
>
> Here is the query:
>
> SELECT *
> FROM
> location l1,
> location l2,
> gene,
> primer
> WHERE
> l1.subjectid <> l2.subjectid
> AND l1.objectid = l2.objectid
> AND l1.subjectid = gene.id
> AND l2.subjectid = primer.id
> AND l2.intermine_start <= l1.intermine_start
> AND l2.intermine_end >= l1.intermine_start
>
> Here is the btree index:
>
> CREATE INDEX location_object_start ON location (objectid,
> intermine_start);
>
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Join
> (cost=26213.16..135980894.76 rows=3155740824 width=484)
> (actual time=2799.260..14256.588 rows=2758 loops=1)
> Hash Cond: (l1.subjectid = gene.id)
> -> Nested Loop
> (cost=0.00..4364485.01 rows=8891802645 width=324)
> (actual time=9.748..10418.807 rows=390695 loops=1)
> Join Filter: (l1.subjectid <> l2.subjectid)
> -> Nested Loop
> (cost=0.00..446862.58 rows=572239 width=259)
> (actual time=9.720..4226.117 rows=211880 loops=1)
> -> Seq Scan on primer
> (cost=0.00..15358.80 rows=211880 width=194)
> (actual time=9.678..579.877 rows=211880 loops=1)
> -> Index Scan using location__key_all on location l2
> (cost=0.00..2.00 rows=3 width=65)
> (actual time=0.004..0.007 rows=1 loops=211880)
> Index Cond: (l2.subjectid = primer.id)
> -> Index Scan using location_object_start on location l1
> (cost=0.00..3.85 rows=150 width=65)
> (actual time=0.005..0.012 rows=3 loops=211880)
> Index Cond: ((l1.objectid = l2.objectid) AND
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
> l1.intermine_start))
> -> Hash
> (cost=20496.96..20496.96 rows=457296 width=160)
> (actual time=2788.698..2788.698 rows=457296 loops=1)
> -> Seq Scan on gene
> (cost=0.00..20496.96 rows=457296 width=160)
> (actual time=0.038..1420.604 rows=457296 loops=1)
> Total runtime: 14263.846 ms
> (13 rows)
>
>
> Here is the GiST index:
>
> CREATE INDEX location_object_start_gist ON location USING gist
> (objectid, intermine_start);
>
> QUERY PLAN
> ------------------------------------------------------------------------
> Hash Join
> (cost=26213.16..136159960.32 rows=3155740824 width=484)
> (actual time=2576.109..2300486.267 rows=2758 loops=1)
> Hash Cond: (l1.subjectid = gene.id)
> -> Nested Loop
> (cost=0.00..4543550.56 rows=8891802645 width=324)
> (actual time=366.121..2296668.740 rows=390695 loops=1)
> Join Filter: (l1.subjectid <> l2.subjectid)
> -> Nested Loop
> (cost=0.00..446862.58 rows=572239 width=259)
> (actual time=362.774..13423.443 rows=211880 loops=1)
> -> Seq Scan on primer
> (cost=0.00..15358.80 rows=211880 width=194)
> (actual time=319.559..1296.907 rows=211880 loops=1)
> -> Index Scan using location__key_all on location l2
> (cost=0.00..2.00 rows=3 width=65)
> (actual time=0.041..0.045 rows=1 loops=211880)
> Index Cond: (l2.subjectid = primer.id)
> -> Index Scan using location_object_start_gist on location l1
> (cost=0.00..4.16 rows=150 width=65)
> (actual time=3.354..10.757 rows=3 loops=211880)
> Index Cond: ((l1.objectid = l2.objectid) AND
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
> l1.intermine_start))
> -> Hash
> (cost=20496.96..20496.96 rows=457296 width=160)
> (actual time=2157.914..2157.914 rows=457296 loops=1)
> -> Seq Scan on gene
> (cost=0.00..20496.96 rows=457296 width=160)
> (actual time=3.904..1206.907 rows=457296 loops=1)
> Total runtime: 2300510.674 ms
> (13 rows)
>
> The query plans are identical except in the type of index used, but
> there is a factor of a few hundred in execute time. Is this the kind
> of factor that would be expected, or is there something amiss? Is this
> seen as something that might be improved in the future?
>
> Matthew
>

In response to

GiST index performance at 2009-04-16 16:06:03 from Matthew Wakeling

Responses

Re: GiST index performance at 2009-04-16 17:23:49 from Matthew Wakeling
Re: GiST index performance at 2009-04-16 17:52:35 from Tom Lane
Re: GiST index performance at 2009-04-17 01:22:08 from Craig Ringer

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matthew Wakeling	2009-04-16 17:23:49	Re: GiST index performance
Previous Message	Tom Lane	2009-04-16 16:46:46	Re: GiST index performance