Re: hash semi join caused by "IN (select ...)"

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: hash semi join caused by "IN (select ...)"
Date: 2011-05-17 02:44:20
Message-ID: BANLkTinBns6zuaKDVfTMenXYSz=fnxntCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, May 16, 2011 at 3:30 PM, Clemens Eisserer <linuxhippy(at)gmail(dot)com>wrote:

> Hi,
>
> I have a quite complex, performance sensitive query in a system with a
> few (7) joins:
> select .... from t1 left join t2 .... WHERE id IN (select ....)
>
> For this query the planner evaluates the IN with a hash semi join last,
> and all the joining is done by hash joins for all rows contained in t1.
>
> However when I specify the ids manually (IN (1, 2, 3, 4, 5) the
> planner first does an index lookup on the primary key column id,
> and subsequently does nested loop joins using an index on t2 - which
> gives way better results.
>
> Is there any way to guide the planner to evaluate the IN condition
> first, instead of last?
> Why is the planner behaving this way? (postgresql 8.4.??)
>
> Thank you in advance, Clemens
>
>
> Query plan with IN(select):
>
> Sort (cost=165.77..165.77 rows=2 width=16974) (actual
> time=13.459..13.460 rows=2 loops=1)
> Sort Key: this_.id
> Sort Method: quicksort Memory: 26kB
> -> Hash Semi Join (cost=123.09..165.76 rows=2 width=16974)
> (actual time=12.741..13.432 rows=2 loops=1)
> Hash Cond: (this_.id = kladdenent0_.id)
> -> Hash Left Join (cost=119.17..160.90 rows=348
> width=16974) (actual time=8.765..13.104 rows=342 loops=1)
> Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id)
> -> Hash Left Join (cost=118.10..155.08 rows=348
> width=16454) (actual time=8.724..12.412 rows=342 loops=1)
> Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id)
> -> Hash Left Join (cost=117.06..152.71 rows=348
> width=15934) (actual time=8.660..11.786 rows=342 loops=1)
> Hash Cond: (this_.lehrerid = pilot5_.id)
> -> Hash Left Join (cost=96.66..130.46
> rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1)
> Hash Cond: (this_.nachid = flugplatz6_.id)
> -> Hash Left Join
> (cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429
> rows=342 loops=1)
> Hash Cond: (this_.flugzeugid =
> flugzeug2_.id)
> -> Hash Left Join
> (cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374
> rows=342 loops=1)
> Hash Cond: (this_.pilotid
> = pilot7_.id)
> -> Hash Left Join
> (cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548
> rows=342 loops=1)
> Hash Cond:
> (this_.vonid = flugplatz8_.id)
> -> Seq Scan on
> startkladde this_ (cost=0.00..14.48 rows=348 width=117) (actual
> time=0.004..0.074 rows=342 loops=1)
> -> Hash
> (cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79
> loops=1)
> -> Seq Scan
> on flugplatz flugplatz8_ (cost=0.00..1.79 rows=79 width=542) (actual
> time=0.003..0.010 rows=79 loops=1)
> -> Hash
> (cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938
> rows=375 loops=1)
> -> Seq Scan on
> pilot pilot7_ (cost=0.00..15.73 rows=373 width=7022) (actual
> time=0.006..0.769 rows=375 loops=1)
> -> Hash (cost=51.43..51.43
> rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1)
> -> Seq Scan on flugzeug
> flugzeug2_ (cost=0.00..51.43 rows=1543 width=689) (actual
> time=0.006..1.615 rows=1543 loops=1)
> -> Hash (cost=1.79..1.79 rows=79
> width=542) (actual time=0.031..0.031 rows=79 loops=1)
> -> Seq Scan on flugplatz
> flugplatz6_ (cost=0.00..1.79 rows=79 width=542) (actual
> time=0.003..0.011 rows=79 loops=1)
> -> Hash (cost=15.73..15.73 rows=373
> width=7022) (actual time=2.236..2.236 rows=375 loops=1)
> -> Seq Scan on pilot pilot5_
> (cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781
> rows=375 loops=1)
> -> Hash (cost=1.02..1.02 rows=2 width=520)
> (actual time=0.005..0.005 rows=2 loops=1)
> -> Seq Scan on bmintype bmintype4_
> (cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2
> loops=1)
> -> Hash (cost=1.03..1.03 rows=3 width=520) (actual
> time=0.004..0.004 rows=3 loops=1)
> -> Seq Scan on flugzeugtype flugzeugty3_
> (cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3
> loops=1)
> -> Hash (cost=3.90..3.90 rows=2 width=4) (actual
> time=0.239..0.239 rows=2 loops=1)
> -> Limit (cost=0.00..3.88 rows=2 width=4) (actual
> time=0.202..0.236 rows=2 loops=1)
> -> Index Scan using startkladde_pkey on
> startkladde kladdenent0_ (cost=0.00..56.24 rows=29 width=4) (actual
> time=0.200..0.233 rows=2 loops=1)
> Filter: ((status > 0) OR (id = (-1)))
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In some cases, I've seen improved results when replacing the IN (...) with =
ANY(ARRAY(...)).
Dave

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Klemme 2011-05-17 07:00:31 Re: [PERFORMANCE] expanding to SAN: which portion best to move
Previous Message Tom Lane 2011-05-16 23:22:48 Re: hash semi join caused by "IN (select ...)"