Re: sub-select in IN clause results in sequential scan

From: Anj Adu <fotographs(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Angayarkanni <kangayarkanni(at)gmail(dot)com>, Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: sub-select in IN clause results in sequential scan
Date: 2009-10-29 15:40:20
Message-ID: f2fd819a0910290840t79a1d742sb1ac9fb2b12f234b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I had posted this on another thread..but did not get a response..here
it is again

explain analyze select thedate,sent.watch as wat, nod.point as fwl,
act.acttype, intf.pointofcontact, func.getNum(snum) as sss,
func.getNum(dnum) as ddd, dddport,

aaa.aaacol,szone.ssszn as ssszone, dzone.dddzn as
dddzone,snippets,timea,total from (select date_trunc('day',thedate) as

thedate,watch_id,point_id,acttype_id,pointofcontact_id,snum,dnum,dddport,aaacol_id,ssszone_id,dddzone_id,sum(snippets)
as snippets,sum(timea) as timea,sum(summcount) as total

from realdev_date_facet where thedate between '2009-10-17' and
'2009-10-24' and watch_id in (3) group by

date_trunc('day',thedate),watch_id,point_id,acttype_id,pointofcontact_id,snum,dnum,dddport,aaacol_id,ssszone_id,dddzone_id,snippets,timea
order by 1) a left outer join

realdev_pointofcontact intf on a.pointofcontact_id =
intf.pointofcontact_id left outer join realdev_ssszn szone on
a.ssszone_id = szone.ssszn_id left outer join realdev_dddzn

dzone on a.dddzone_id = dzone.dddzn_id left outer join realdev_aaacol
aaa on a.aaacol_id = aaa.aaacol_id, realdev_watch sent, realdev_point
nod, realdev_acttype act where

a.watch_id = sent.watch_id and a.point_id = nod.point_id and
a.acttype_id = act.acttype_id

Slow Query (with IN clause sub-select)
--------------------------------------------------------------------------------------
Hash Join (cost=2436528.60..2493232.81 rows=310708 width=996)
(actual time=144303.550..144609.576 rows=7294 loops=1)
Hash Cond: ("outer".watch_id = "inner".watch_id)
-> Hash Join (cost=2436513.10..2487003.15 rows=310708 width=854)
(actual time=144222.468..144287.330 rows=7294 loops=1)
Hash Cond: ("outer".point_id = "inner".point_id)
-> Hash Join (cost=2436497.60..2482327.03 rows=310708
width=712) (actual time=144222.358..144281.371 rows=7294 loops=1)
Hash Cond: ("outer".acttype_id = "inner".acttype_id)
-> Hash Left Join (cost=2436477.97..2477646.78
rows=310708 width=648) (actual time=144222.319..144275.382 rows=7294
loops=1)
Hash Cond: ("outer".aaacol_id = "inner".aaacol_id)
-> Hash Left Join (cost=2436457.35..2472965.54
rows=310708 width=594) (actual time=144222.267..144269.326 rows=7294
loops=1)
Hash Cond: ("outer".dddzone_id = "inner".dddzn_id)
-> Hash Left Join
(cost=2436440.85..2468288.42 rows=310708 width=480) (actual
time=144222.153..144263.530 rows=7294 loops=1)
Hash Cond: ("outer".ssszone_id =
"inner".ssszn_id)
-> Hash Left Join
(cost=2436426.97..2463613.92 rows=310708 width=266) (actual
time=144222.009..144257.037 rows=7294 loops=1)
Hash Cond:
("outer".pointofcontact_id = "inner".pointofcontact_id)
-> GroupAggregate
(cost=2436410.47..2455829.72 rows=310708 width=80) (actual
time=144221.980..144252.195 rows=7294 loops=1)
-> Sort
(cost=2436410.47..2437187.24 rows=310708 width=80) (actual
time=144221.950..144224.805 rows=10248 loops=1)
Sort Key:
date_trunc('day'::text, public.realdev_date_facet.thedate),
public.realdev_date_facet.watch_id,

public.realdev_date_facet.point_id,
public.realdev_date_facet.acttype_id,
public.realdev_date_facet.pointofcontact_id,
public.realdev_date_facet.snum,

public.realdev_date_facet.dnum, public.realdev_date_facet.dddport,
public.realdev_date_facet.aaacol_id,
public.realdev_date_facet.ssszone_id,

public.realdev_date_facet.dddzone_id,
public.realdev_date_facet.snippets, public.realdev_date_facet.timea
-> Hash IN Join
(cost=15.51..2408065.83 rows=310708 width=80) (actual
time=73.279..144105.862 rows=10248 loops=1)
Hash Cond:
("outer".watch_id = "inner".watch_id)
-> Append
(cost=0.00..2062387.41 rows=68355812 width=80) (actual
time=8.161..-17465745.684 rows=68355711 loops=1)
->
Index Scan using realdev_dy_dim_idx1 on realdev_date_facet
(cost=0.00..3.25 rows=1 width=80) (actual

time=0.040..0.040 rows=0 loops=1)

Index Cond: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_17 realdev_date_facet
(cost=0.00..216426.39 rows=7166959 width=80)

(actual time=8.119..11012.923 rows=7166717 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_18 realdev_date_facet
(cost=0.00..250263.65 rows=8291577 width=80)

(actual time=7.419..18751.080 rows=8291095 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_19 realdev_date_facet
(cost=0.00..289231.36 rows=9589091 width=80)

(actual time=0.027..19666.968 rows=9589432 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_20 realdev_date_facet
(cost=0.00..288674.88 rows=9572392 width=80)

(actual time=0.029..12557.198 rows=9572601 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_21 realdev_date_facet
(cost=0.00..269963.64 rows=8949976 width=80)

(actual time=0.036..9544.469 rows=8950605 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_22 realdev_date_facet
(cost=0.00..274093.95 rows=9089330 width=80)

(actual time=0.027..26397891.108 rows=9088813 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_23 realdev_date_facet
(cost=0.00..253855.74 rows=8417049 width=80)

(actual time=0.027..9165.289 rows=8417659 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Seq
Scan on realdev_date_facet_2009_10_24 realdev_date_facet
(cost=0.00..219874.55 rows=7279437 width=80)

(actual time=0.035..13203440.555 rows=7278789 loops=1)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))
-> Hash
(cost=15.50..15.50 rows=2 width=4) (actual time=0.025..0.025 rows=1
loops=1)
-> Seq
Scan on realdev_watch (cost=0.00..15.50 rows=2 width=4) (actual
time=0.012..0.020 rows=1 loops=1)

Filter: ((watch)::text ~~ 'searchtext%'::text)
-> Hash (cost=15.20..15.20
rows=520 width=122) (actual time=0.002..0.002 rows=0 loops=1)
-> Seq Scan on
realdev_pointofcontact intf (cost=0.00..15.20 rows=520 width=122)
(actual time=0.001..0.001 rows=0 loops=1)
-> Hash (cost=13.10..13.10 rows=310
width=222) (actual time=0.131..0.131 rows=100 loops=1)
-> Seq Scan on realdev_ssszn
szone (cost=0.00..13.10 rows=310 width=222) (actual time=0.028..0.076
rows=100 loops=1)
-> Hash (cost=15.20..15.20 rows=520
width=122) (actual time=0.103..0.103 rows=85 loops=1)
-> Seq Scan on realdev_dddzn dzone
(cost=0.00..15.20 rows=520 width=122) (actual time=0.016..0.054
rows=85 loops=1)
-> Hash (cost=18.50..18.50 rows=850 width=62)
(actual time=0.038..0.038 rows=7 loops=1)
-> Seq Scan on realdev_aaacol aaa
(cost=0.00..18.50 rows=850 width=62) (actual time=0.028..0.031 rows=7
loops=1)
-> Hash (cost=17.70..17.70 rows=770 width=72) (actual
time=0.027..0.027 rows=6 loops=1)
-> Seq Scan on realdev_acttype act
(cost=0.00..17.70 rows=770 width=72) (actual time=0.018..0.020 rows=6
loops=1)
-> Hash (cost=14.40..14.40 rows=440 width=150) (actual
time=0.099..0.099 rows=69 loops=1)
-> Seq Scan on realdev_point nod (cost=0.00..14.40
rows=440 width=150) (actual time=0.024..0.064 rows=69 loops=1)
-> Hash (cost=14.40..14.40 rows=440 width=150) (actual
time=0.055..0.055 rows=30 loops=1)
-> Seq Scan on realdev_watch sent (cost=0.00..14.40
rows=440 width=150) (actual time=0.021..0.037 rows=30 loops=1)
Total runtime: 144613.558 ms

=================================================================================================================================================================================

FAST Query (With hardcode IN value)

Hash Join (cost=1222737.69..1277695.72 rows=448637 width=996) (actual
time=37125.501..37783.520 rows=7294 loops=1)
Hash Cond: ("outer".watch_id = "inner".watch_id)
-> Hash Join (cost=1222722.19..1268707.48 rows=448637 width=854)
(actual time=37122.482..37166.714 rows=7294 loops=1)
Hash Cond: ("outer".point_id = "inner".point_id)
-> Hash Join (cost=1222706.69..1261962.43 rows=448637
width=712) (actual time=37122.389..37160.697 rows=7294 loops=1)
Hash Cond: ("outer".acttype_id = "inner".acttype_id)
-> Hash Left Join (cost=1222687.07..1255213.25
rows=448637 width=648) (actual time=37122.335..37154.030 rows=7294
loops=1)
Hash Cond: ("outer".aaacol_id = "inner".aaacol_id)
-> Hash Left Join (cost=1222666.44..1248463.07
rows=448637 width=594) (actual time=37122.306..37147.818 rows=7294
loops=1)
Hash Cond: ("outer".dddzone_id = "inner".dddzn_id)
-> Hash Left Join
(cost=1222649.94..1241717.01 rows=448637 width=480) (actual
time=37122.194..37140.144 rows=7294 loops=1)
Hash Cond: ("outer".ssszone_id =
"inner".ssszn_id)
-> Hash Left Join
(cost=1222636.07..1234973.58 rows=448637 width=266) (actual
time=37122.076..37133.362 rows=7294 loops=1)
Hash Cond:
("outer".pointofcontact_id = "inner".pointofcontact_id)
-> Sort
(cost=1222619.57..1223741.16 rows=448637 width=80) (actual
time=37122.054..37124.857 rows=7294 loops=1)
Sort Key:
date_trunc('day'::text, public.realdev_date_facet.thedate)
-> HashAggregate
(cost=1171530.60..1180503.34 rows=448637 width=80) (actual
time=37098.239..37108.376 rows=7294 loops=1)
-> Result
(cost=0.00..1120386.08 rows=1278613 width=80) (actual
time=8010.438..37052.063 rows=10248 loops=1)
-> Append
(cost=0.00..1117189.55 rows=1278613 width=80) (actual
time=8010.420..37032.106 rows=10248 loops=1)
->
Index Scan using realdev_dy_dim_idx1 on realdev_date_facet
(cost=0.00..2.69 rows=1 width=80) (actual

time=0.027..0.027 rows=0 loops=1)

Index Cond: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone) AND (watch_id = 3))
->
Bitmap Heap Scan on realdev_date_facet_2009_10_17 realdev_date_facet
(cost=1175.35..116994.25 rows=184386

width=80) (actual time=8010.391..8027.057 rows=1025 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_17_watch_id
(cost=0.00..1175.35 rows=184386 width=0)

(actual time=8010.057..8010.057 rows=1025 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_18 realdev_date_facet
(cost=898.09..135163.52 rows=141169

width=80) (actual time=7926.811..7941.851 rows=985 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_18_watch_id
(cost=0.00..898.09 rows=141169 width=0)

(actual time=7926.583..7926.583 rows=985 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_19 realdev_date_facet
(cost=1068.33..156545.18 rows=167809

width=80) (actual time=210.303..230.478 rows=1277 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_19_watch_id
(cost=0.00..1068.33 rows=167809 width=0)

(actual time=209.980..209.980 rows=1277 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_20 realdev_date_facet
(cost=1076.96..156331.90 rows=168846

width=80) (actual time=3388.336..3475.603 rows=1508 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_20_watch_id
(cost=0.00..1076.96 rows=168846 width=0)

(actual time=3387.985..3387.985 rows=1508 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_21 realdev_date_facet
(cost=959.30..145554.92 rows=150658

width=80) (actual time=9787.370..9807.884 rows=1383 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_21_watch_id
(cost=0.00..959.30 rows=150658 width=0)

(actual time=9787.039..9787.039 rows=1383 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_22 realdev_date_facet
(cost=1165.49..149480.07 rows=182999

width=80) (actual time=6884.397..6970.130 rows=1625 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_22_watch_id
(cost=0.00..1165.49 rows=182999 width=0)

(actual time=6884.011..6884.011 rows=1625 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_23 realdev_date_facet
(cost=984.91..137907.55 rows=154546

width=80) (actual time=307.460..333.678 rows=1395 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_23_watch_id
(cost=0.00..984.91 rows=154546 width=0)

(actual time=307.150..307.150 rows=1395 loops=1)

Index Cond: (watch_id = 3)
->
Bitmap Heap Scan on realdev_date_facet_2009_10_24 realdev_date_facet
(cost=816.70..119209.47 rows=128199

width=80) (actual time=214.640..239.955 rows=1050 loops=1)

Recheck Cond: (watch_id = 3)

Filter: ((thedate >= '2009-10-17 00:00:00'::timestamp without time
zone) AND (thedate <= '2009-10-24

00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on realdev_date_facet_2009_10_24_watch_id
(cost=0.00..816.70 rows=128199 width=0)

(actual time=214.276..214.276 rows=1050 loops=1)

Index Cond: (watch_id = 3)
-> Hash (cost=15.20..15.20
rows=520 width=122) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on
realdev_pointofcontact intf (cost=0.00..15.20 rows=520 width=122)
(actual time=0.002..0.002 rows=0 loops=1)
-> Hash (cost=13.10..13.10 rows=310
width=222) (actual time=0.111..0.111 rows=100 loops=1)
-> Seq Scan on realdev_ssszn
szone (cost=0.00..13.10 rows=310 width=222) (actual time=0.011..0.065
rows=100 loops=1)
-> Hash (cost=15.20..15.20 rows=520
width=122) (actual time=0.096..0.096 rows=85 loops=1)
-> Seq Scan on realdev_dddzn dzone
(cost=0.00..15.20 rows=520 width=122) (actual time=0.006..0.049
rows=85 loops=1)
-> Hash (cost=18.50..18.50 rows=850 width=62)
(actual time=0.016..0.016 rows=7 loops=1)
-> Seq Scan on realdev_aaacol aaa
(cost=0.00..18.50 rows=850 width=62) (actual time=0.006..0.009 rows=7
loops=1)
-> Hash (cost=17.70..17.70 rows=770 width=72) (actual
time=0.041..0.041 rows=6 loops=1)
-> Seq Scan on realdev_acttype act
(cost=0.00..17.70 rows=770 width=72) (actual time=0.032..0.035 rows=6
loops=1)
-> Hash (cost=14.40..14.40 rows=440 width=150) (actual
time=0.080..0.080 rows=69 loops=1)
-> Seq Scan on realdev_point nod (cost=0.00..14.40
rows=440 width=150) (actual time=0.007..0.040 rows=69 loops=1)
-> Hash (cost=14.40..14.40 rows=440 width=150) (actual
time=0.055..0.055 rows=30 loops=1)
-> Seq Scan on realdev_watch sent (cost=0.00..14.40
rows=440 width=150) (actual time=0.020..0.039 rows=30 loops=1)
Total runtime: 37790.144 ms

On Thu, Oct 29, 2009 at 8:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Oct 29, 2009 at 10:10 AM, Anj Adu <fotographs(at)gmail(dot)com> wrote:
>> Join did not help. A sequential scan is still being done. The
>> hardcoded value in the IN clause performs the best. The time
>> difference is more than an order of magnitude.
>
> If you want help debugging a performance problem, you need to post
> your EXPLAIN ANALYZE results.
>
> http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>
> ...Robert
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Ernst 2009-10-29 15:58:36 Re: database size growing continously
Previous Message Robert Haas 2009-10-29 15:35:29 Re: sub-select in IN clause results in sequential scan