Re: hashjoin chosen over 1000x faster plan

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: hashjoin chosen over 1000x faster plan
Date: 2007-10-10 20:11:48
Message-ID: 470CEBB4.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>> On Wed, Oct 10, 2007 at 2:52 PM, in message
<1192045945(dot)4233(dot)351(dot)camel(at)ebony(dot)site>, Simon Riggs <simon(at)2ndquadrant(dot)com>
wrote:
>
> The fast plan is an all-or-nothing plan. It is *only* faster when the
> number of matched rows is zero. You know it is zero, but currently the
> planner doesn't, nor is it able to make use of the information when it
> has it, half thru execution. Even if we could work out the high
> probability of it being zero, we would still be left with the decision
> of whether to optimise for the zero or for the non-zero.

For a different case number which has four charges, two reopened:

Sort (cost=2450.27..2450.28 rows=4 width=146) (actual time=463.048..463.052 rows=4 loops=1)
Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
-> Hash Left Join (cost=2318.93..2450.23 rows=4 width=146) (actual time=462.857..462.995 rows=4 loops=1)
Hash Cond: ((("CHST"."eventType")::bpchar = ("CTHE"."eventType")::bpchar) AND (("CHST"."caseType")::bpchar = ("CTHE"."caseType")::bpchar))
-> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=131) (actual time=0.045..0.165 rows=4 loops=1)
-> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=0.036..0.053 rows=4 loops=1)
Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar))
-> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual time=0.018..0.019 rows=0 loops=4)
Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint))
-> Hash (cost=2084.82..2084.82 rows=15607 width=98) (actual time=462.780..462.780 rows=15607 loops=1)
-> Subquery Scan "CTHE" (cost=1630.43..2084.82 rows=15607 width=98) (actual time=355.962..433.081 rows=15607 loops=1)
-> Merge Right Join (cost=1630.43..1928.75 rows=15607 width=89) (actual time=355.960..414.249 rows=15607 loops=1)
Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar = "inner"."?column10?") AND ((d."eventType")::bpchar = "inner"."?column11?"))
-> Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d (cost=0.00..87.77 rows=2051 width=21) (actual time=0.025..0.713 rows=434 loops=1)
-> Sort (cost=1630.43..1669.45 rows=15607 width=76) (actual time=355.320..365.251 rows=15607 loops=1)
Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
-> Nested Loop (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..46.914 rows=15607 loops=1)
-> Index Scan using "ControlRecord_pkey" on "ControlRecord" c (cost=0.00..4.27 rows=1 width=2) (actual time=0.010..0.019 rows=1 loops=1)
Index Cond: (("countyNo")::smallint = 13)
-> Seq Scan on "CaseTypeHistEventB" b (cost=0.00..383.07 rows=15607 width=74) (actual time=0.019..14.069 rows=15607 loops=1)
Total runtime: 464.588 ms
(21 rows)

With set enable_hashjoin = off:

Sort (cost=3404.68..3404.69 rows=4 width=146) (actual time=448.049..448.053 rows=4 loops=1)
Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
-> Merge Left Join (cost=3287.55..3404.64 rows=4 width=146) (actual time=447.986..448.005 rows=4 loops=1)
Merge Cond: (("outer"."?column16?" = "inner"."?column5?") AND ("outer"."?column17?" = "inner"."?column6?"))
-> Sort (cost=115.71..115.72 rows=4 width=131) (actual time=0.179..0.182 rows=4 loops=1)
Sort Key: ("CHST"."caseType")::bpchar, ("CHST"."eventType")::bpchar
-> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=131) (actual time=0.051..0.139 rows=4 loops=1)
-> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=0.040..0.053 rows=4 loops=1)
Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar))
-> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual time=0.013..0.014 rows=0 loops=4)
Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint))
-> Sort (cost=3171.84..3210.86 rows=15607 width=98) (actual time=446.459..446.936 rows=768 loops=1)
Sort Key: ("CTHE"."caseType")::bpchar, ("CTHE"."eventType")::bpchar
-> Subquery Scan "CTHE" (cost=1630.43..2084.82 rows=15607 width=98) (actual time=322.928..405.654 rows=15607 loops=1)
-> Merge Right Join (cost=1630.43..1928.75 rows=15607 width=89) (actual time=322.922..381.371 rows=15607 loops=1)
Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar = "inner"."?column10?") AND ((d."eventType")::bpchar = "inner"."?column11?"))
-> Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d (cost=0.00..87.77 rows=2051 width=21) (actual time=0.024..0.734 rows=434 loops=1)
-> Sort (cost=1630.43..1669.45 rows=15607 width=76) (actual time=322.294..332.182 rows=15607 loops=1)
Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
-> Nested Loop (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..45.539 rows=15607 loops=1)
-> Index Scan using "ControlRecord_pkey" on "ControlRecord" c (cost=0.00..4.27 rows=1 width=2) (actual time=0.010..0.016 rows=1 loops=1)
Index Cond: (("countyNo")::smallint = 13)
-> Seq Scan on "CaseTypeHistEventB" b (cost=0.00..383.07 rows=15607 width=74) (actual time=0.019..13.754 rows=15607 loops=1)
Total runtime: 449.660 ms
(24 rows)

So in all cases it is faster without the hashjoin; it's just a
question of whether it is 4% faster or 1000 times faster, with a 99+%
chance of being 1000 times faster.

This may get back to a question I've always had about the wisdom of
rounding fractional reads to whole numbers. You lose information
which might lead to better plan choices. You can't read half a row,
but you can read one row half the time.

-Kevin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Trutwin 2007-10-10 20:32:01 Re: Shared Buffer setting in postgresql.conf
Previous Message Theo Kramer 2007-10-10 20:08:30 Re: Performance problems with prepared statements