Re: EXISTS optimization

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>,<pgsql-performance(at)postgresql(dot)org>, "Andrea Olson" <Aolson(dot)CCAP(dot)Courts(at)wicourts(dot)gov>, "Bill Severson" <BSEVERS(dot)CCAP(dot)Courts(at)wicourts(dot)gov>, "John Hutchins" <jhutchi(dot)CCAP(dot)Courts(at)wicourts(dot)gov>, "Randy Peterson" <RPETERS(dot)CCAP(dot)Courts(at)wicourts(dot)gov>, "Shannon Spranger" <ssprang(dot)CCAP(dot)Courts(at)wicourts(dot)gov>
Subject: Re: EXISTS optimization
Date: 2007-03-23 22:26:04
Message-ID: 46040DAC.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

>>> On Fri, Mar 23, 2007 at 4:49 PM, in message <25339(dot)1174686582(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> explain analyze
>> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date",
> "H"."userId", "H"."time"
>> FROM "Adjustment" "A"
>> JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND
> "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
>> WHERE "H"."tranType" = 'A'
>> AND "A"."date" > DATE '2006- 01- 01'
>> AND "H"."countyNo" = 66
>> AND "A"."countyNo" = 66
>> AND EXISTS
>> (
>> SELECT 1 FROM "TranDetail" "D"
>> WHERE "D"."tranNo" = "H"."tranNo"
>> AND "D"."countyNo" = "H"."countyNo"
>> AND "D"."caseNo" LIKE '2006TR%'
>> )
>> ;
>
>> The commercial product scans the index on caseNo in TranDetail to build a
> work table of unique values, then uses indexed access to the TranHeader and
> then to Adjustment.
>
> If you want that, try rewriting the EXISTS to an IN:
>
> AND ("H"."tranNo", "H"."countyNo") IN
> (
> SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
> WHERE "D"."caseNo" LIKE '2006TR%'
> )

Nice. I get this:

explain analyze
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
FROM "Adjustment" "A"
JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
WHERE "H"."tranType" = 'A'
AND "A"."date" > DATE '2006- 01- 01'
AND "H"."countyNo" = 66
AND "A"."countyNo" = 66
AND ("H"."tranNo", "H"."countyNo") IN
(
SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
WHERE "D"."caseNo" LIKE '2006TR%'
)
;

Nested Loop (cost=27.76..36.38 rows=1 width=46) (actual time=92.999..200.398 rows=2209 loops=1)
Join Filter: (("H"."tranNo")::integer = ("A"."tranNo")::integer)
-> Nested Loop (cost=27.76..32.08 rows=1 width=50) (actual time=92.970..176.472 rows=2209 loops=1)
-> HashAggregate (cost=27.76..27.77 rows=1 width=6) (actual time=92.765..100.810 rows=9788 loops=1)
-> Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D" (cost=0.00..27.66 rows=20 width=6) (actual time=0.059..60.967 rows=46301 loops=1)
Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint = 66))
Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)
-> Index Scan using "TranHeader_pkey" on "TranHeader" "H" (cost=0.00..4.30 rows=1 width=46) (actual time=0.006..0.006 rows=0 loops=9788)
Index Cond: ((("H"."tranNo")::integer = ("D"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
Filter: (("tranType")::bpchar = 'A'::bpchar)
-> Index Scan using "Adjustment_pkey" on "Adjustment" "A" (cost=0.00..4.28 rows=1 width=22) (actual time=0.008..0.009 rows=1 loops=2209)
Index Cond: ((("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) AND (("A"."countyNo")::smallint = 66))
Filter: ((date)::date > '2006-01-01'::date)
Total runtime: 201.306 ms

That's the good news. The bad news is that I operate under a management portability dictate which doesn't currently allow that syntax, since not all of the products they want to cover support it. I tried something which seems equivalent, but it is running for a very long time. I'll show it with just the explain while I wait to see how long the explain analyze takes.

explain
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
FROM "Adjustment" "A"
JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
WHERE "H"."tranType" = 'A'
AND "A"."date" > DATE '2006- 01- 01'
AND "H"."countyNo" = 66
AND "A"."countyNo" = 66
AND "H"."tranNo" IN
(
SELECT "D"."tranNo" FROM "TranDetail" "D"
WHERE "D"."caseNo" LIKE '2006TR%'
AND "D"."countyNo" = "H"."countyNo"
)
;

Nested Loop (cost=0.00..181673.08 rows=1 width=46)
Join Filter: (("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar)
-> Seq Scan on "Adjustment" "A" (cost=0.00..2384.27 rows=11733 width=22)
Filter: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66))
-> Index Scan using "TranHeader_pkey" on "TranHeader" "H" (cost=0.00..15.27 rows=1 width=46)
Index Cond: ((("H"."tranNo")::integer = ("A"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
Filter: ((("tranType")::bpchar = 'A'::bpchar) AND (subplan))
SubPlan
-> Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D" (cost=0.00..27.66 rows=20 width=4)
Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint = ($0)::smallint))
Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)

> We don't currently try to flatten EXISTS into a unique/join plan as we
> do for IN. I seem to recall not doing so when I rewrote IN planning
> because I didn't think it would be exactly semantically equivalent,
> but that was awhile ago. Right at the moment it seems like it ought
> to be equivalent as long as the comparison operators are strict.

There are a great many situations where they are exactly semantically equivalent. In fact, the commercial database product usually generates an identical plan. I could try to work out (or better yet find) a formal description of when that equivalence holds, if someone would be up for implementing it. Barring that, I could see if management would approve some time for me to look at submitting a patch, but I haven't looked at the code involved, so I have no idea of the scale of effort involved yet.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2007-03-23 22:26:41 Re: EXISTS optimization
Previous Message Alvaro Herrera 2007-03-23 21:57:40 Re: [COMMITTERS] pgsql: We no longer need to palloc the VacuumStmt node; keeping it on

Browse pgsql-performance by date

  From Date Subject
Next Message Martijn van Oosterhout 2007-03-23 22:26:41 Re: EXISTS optimization
Previous Message Tom Lane 2007-03-23 22:22:39 Re: Strange left outer join performance issue