Skip site navigation (1) Skip section navigation (2)

performance of IN (subquery)

From: Kevin Murphy <murphy(at)genome(dot)chop(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: performance of IN (subquery)
Date: 2004-08-26 22:24:23
Message-ID: AE87AE00-F7AE-11D8-91C6-0003930D3626@genome.chop.edu (view raw or flat)
Thread:
Lists: pgsql-general
I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from  
bar where baz in (subquery)', or updates like 'update bar set foo = 2  
where baz in (subquery)'.  PG always seems to want to do a sequential  
scan of the bar table.  I wish there were a way of telling PG, "use the  
index on baz in your plan, because I know that the subquery will return  
very few results".   Where it really matters, I have been constructing  
dynamic queries by looping over the values for baz and building a  
separate query for each one and combining with a UNION (or just  
directly updating, in the update case).  Depending on the size of the  
bar table, I can get speedups of hundreds or even more than a thousand  
times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
       7644
       7644
(2 rows)
Time: 518.242 ms


The subquery is fast:

SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1;
------------
       41209
       25047
(2 rows)
Time: 3.268 ms


And using indexes on the main table is fast:

select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
       7644
       7644
(2 rows)
Time: 2.468 ms

The plan for the slow query:

egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
            FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
            WHERE bundle_superlocs_2.protobundle_id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#              
                                                  QUERY PLAN              
                         \

------------------------------------------------------------------------ 
-------------------------------------------------------------
  Hash Join  (cost=70.33..72.86 rows=25 width=4) (actual  
time=583.051..583.059 rows=2 loops=1)
    Hash Cond: ("outer".element_id = "inner".elementid)
    ->  HashAggregate  (cost=47.83..47.83 rows=25 width=4) (actual  
time=0.656..0.658 rows=2 loops=1)
          ->  Hash Join  (cost=22.51..47.76 rows=25 width=4) (actual  
time=0.615..0.625 rows=2 loops=1)
                Hash Cond: ("outer".superloc_id = "inner".superloc_id)
                ->  Seq Scan on superlocs_2  (cost=0.00..20.00 rows=1000  
width=8) (actual time=0.004..0.012 rows=9 loops=1)
                ->  Hash  (cost=22.50..22.50 rows=5 width=4) (actual  
time=0.076..0.076 rows=0 loops=1)
                      ->  Seq Scan on bundle_superlocs_2   
(cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2  
loops=1)
                            Filter: (protobundle_id = 1)
    ->  Hash  (cost=20.00..20.00 rows=1000 width=8) (actual  
time=581.802..581.802 rows=0 loops=1)
          ->  Seq Scan on elements  (cost=0.00..20.00 rows=1000 width=8)  
(actual time=0.172..405.243 rows=185535 loops=1)
  Total runtime: 593.843 ms
(12 rows)


Responses

pgsql-general by date

Next:From: Jan WieckDate: 2004-08-26 22:35:06
Subject: Re: Possible to insert quoted null value into integer field?
Previous:From: Benjamin SmithDate: 2004-08-26 22:04:29
Subject: Re: Constraints to Guarantee unique across tables with foreign key?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group