LIMIT 1 == EXISTS optimization?

From: Richard Rowell <richard(dot)rowell(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: LIMIT 1 == EXISTS optimization?
Date: 2009-10-02 17:46:10
Message-ID: b61e61d30910021046i37234b7fqc38a1376485ce234@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was just troubleshooting a slow query

SELECT * FROM da_answer a
WHERE
a.provider_id IN ( SELECT visibility_bypass_providers( 0, 0 ) ) OR --
ownership
(
EXISTS ( -- Visibility grant
SELECT v.client_answer_id FROM sp_client_answervisibility v
JOIN sp_sharing_group_provider_tree t ON v.sharing_group_id =
t.sharing_group_id AND t.provider_id = 0
WHERE
v.client_answer_id = a.answer_id AND v.visible = TRUE
) AND NOT EXISTS ( -- Visibility deny
SELECT v.client_answer_id FROM sp_client_answervisibility v
JOIN sp_sharing_group_provider_tree t ON v.sharing_group_id =
t.sharing_group_id AND t.provider_id = 0
WHERE
v.client_answer_id = a.answer_id AND v.visible = FALSE
) AND --ROI goes here
a.covered_by_roi = TRUE
)

The subplan 3 in the explain seemed to be looping through 3 million rows
which explained the slowdown....

QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on da_answer a (cost=222.43..946804.85 rows=22309
width=70) (actual time=15.717..5141.001 rows=34810 loops=1)
Recheck Cond: (question_id = 18)
Filter: ((hashed SubPlan 1) OR ((alternatives: SubPlan 2 or hashed
SubPlan 3) AND (NOT (alternatives: SubPlan 4 or hashed SubPlan 5)) AND
covered_by_roi))
-> Bitmap Index Scan on daanswer_questionid (cost=0.00..221.26
rows=35695 width=0) (actual time=6.438..6.438 rows=35060 loops=1)
Index Cond: (question_id = 18)
SubPlan 1
-> Result (cost=0.00..0.05 rows=1 width=0) (actual time=3.683..4.621
rows=1728 loops=1)
SubPlan 2
-> Merge Join (cost=9.04..17.43 rows=1 width=0) (never executed)
Merge Cond: (v.sharing_group_id = t.sharing_group_id)
-> Index Scan using
clientanswervisibility_answerid_sharinggroupid_allow on
sp_client_answervisibility v (cost=0.00..8.38 rows=3 width=4) (never
executed)
Index Cond: (client_answer_id = $1)
-> Sort (cost=9.04..9.04 rows=4 width=4) (never executed)
Sort Key: t.sharing_group_id
-> Bitmap Heap Scan on sp_sharing_group_provider_tree t
(cost=2.05..9.03 rows=4 width=4) (never executed)
Recheck Cond: (provider_id = 0)
-> Bitmap Index Scan on
sharinggroupprovidertree_providerid (cost=0.00..2.05 rows=4 width=0) (never
executed)
Index Cond: (provider_id = 0)
SubPlan 3
-> Nested Loop (cost=0.00..52203.49 rows=2316644 width=4) (actual
time=0.053..2827.799 rows=3321883 loops=1)
-> Index Scan using sharinggroupprovidertree_providerid on
sp_sharing_group_provider_tree t (cost=0.00..10.03 rows=4 width=4) (actual
time=0.024..0.030 rows=3 loops=1)
Index Cond: (provider_id = 0)
-> Index Scan using spclientanswervisibility_sharinggroupid on
sp_client_answervisibility v (cost=0.00..13011.17 rows=14877 width=8)
(actual time=0.014..512.286 rows=1107294 loops=3)
Index Cond: (v.sharing_group_id = t.sharing_group_id)
Filter: v.visible
SubPlan 4
-> Nested Loop (cost=0.00..8.19 rows=1 width=0) (never executed)
-> Index Scan using
clientanswervisibility_answerid_sharinggroupid_deny on
sp_client_answervisibility v (cost=0.00..4.13 rows=1 width=4) (never
executed)
Index Cond: (client_answer_id = $1)
-> Index Scan using
sp_sharing_group_provider_tree_sharing_group_id_key on
sp_sharing_group_provider_tree t (cost=0.00..4.05 rows=1 width=4) (never
executed)
Index Cond: ((t.sharing_group_id = v.sharing_group_id) AND
(t.provider_id = 0))
SubPlan 5
-> Nested Loop (cost=2993.74..35065.77 rows=542897 width=4) (actual
time=105.162..105.162 rows=0 loops=1)
-> Bitmap Heap Scan on sp_sharing_group_provider_tree t
(cost=2.05..9.03 rows=4 width=4) (actual time=0.037..0.047 rows=3 loops=1)
Recheck Cond: (provider_id = 0)
-> Bitmap Index Scan on
sharinggroupprovidertree_providerid (cost=0.00..2.05 rows=4 width=0)
(actual time=0.027..0.027 rows=3 loops=1)
Index Cond: (provider_id = 0)
-> Bitmap Heap Scan on sp_client_answervisibility v
(cost=2991.69..8755.47 rows=3486 width=8) (actual time=35.030..35.030 rows=0
loops=3)
Recheck Cond: ((v.sharing_group_id = t.sharing_group_id)
AND (NOT v.visible))
-> Bitmap Index Scan on
clientanswervisibility_answerid_sharinggroupid_deny (cost=0.00..2991.51
rows=3486 width=0) (actual time=35.027..35.027 rows=0 loops=3)
Index Cond: (v.sharing_group_id = t.sharing_group_id)
Total runtime: 5170.291 ms
(42 rows)

So on a whim I tossed a LIMIT 1 into both exists clauses:

SELECT * FROM da_answer a
WHERE
a.provider_id IN ( SELECT visibility_bypass_providers( 0, 0 ) ) OR --
ownership
(
EXISTS ( -- Visibility grant
SELECT v.client_answer_id FROM sp_client_answervisibility v
JOIN sp_sharing_group_provider_tree t ON v.sharing_group_id =
t.sharing_group_id AND t.provider_id = 0
WHERE
v.client_answer_id = a.answer_id AND v.visible = TRUE
LIMIT 1
) AND NOT EXISTS ( -- Visibility deny
SELECT v.client_answer_id FROM sp_client_answervisibility v
JOIN sp_sharing_group_provider_tree t ON v.sharing_group_id =
t.sharing_group_id AND t.provider_id = 0
WHERE
v.client_answer_id = a.answer_id AND v.visible = FALSE
LIMIT 1
) AND --ROI goes here
a.covered_by_roi = TRUE
)

And it went from 5000+ ms to 90ms...

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on da_answer a (cost=222.43..946804.85 rows=22309
width=70) (actual time=15.850..84.705 rows=34810 loops=1)
Recheck Cond: (question_id = 18)
Filter: ((hashed SubPlan 1) OR ((SubPlan 2) AND (NOT (SubPlan 3)) AND
covered_by_roi))
-> Bitmap Index Scan on daanswer_questionid (cost=0.00..221.26
rows=35695 width=0) (actual time=6.319..6.319 rows=35060 loops=1)
Index Cond: (question_id = 18)
SubPlan 1
-> Result (cost=0.00..0.05 rows=1 width=0) (actual time=3.798..4.707
rows=1728 loops=1)
SubPlan 2
-> Limit (cost=9.04..17.43 rows=1 width=4) (actual time=0.007..0.007
rows=1 loops=1994)
-> Merge Join (cost=9.04..17.43 rows=1 width=4) (actual
time=0.007..0.007 rows=1 loops=1994)
Merge Cond: (v.sharing_group_id = t.sharing_group_id)
-> Index Scan using
clientanswervisibility_answerid_sharinggroupid_allow on
sp_client_answervisibility v (cost=0.00..8.38 rows=3 width=8) (actual
time=0.005..0.005 rows=1 loops=1994)
Index Cond: (client_answer_id = $1)
-> Sort (cost=9.04..9.04 rows=4 width=4) (actual
time=0.000..0.000 rows=1 loops=1856)
Sort Key: t.sharing_group_id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on
sp_sharing_group_provider_tree t (cost=2.05..9.03 rows=4 width=4) (actual
time=0.011..0.014 rows=3 loops=1)
Recheck Cond: (provider_id = 0)
-> Bitmap Index Scan on
sharinggroupprovidertree_providerid (cost=0.00..2.05 rows=4 width=0)
(actual time=0.008..0.008 rows=3 loops=1)
Index Cond: (provider_id = 0)
SubPlan 3
-> Limit (cost=0.00..8.19 rows=1 width=4) (actual time=0.003..0.003
rows=0 loops=1744)
-> Nested Loop (cost=0.00..8.19 rows=1 width=4) (actual
time=0.003..0.003 rows=0 loops=1744)
-> Index Scan using
clientanswervisibility_answerid_sharinggroupid_deny on
sp_client_answervisibility v (cost=0.00..4.13 rows=1 width=8) (actual
time=0.002..0.002 rows=0 loops=1744)
Index Cond: (client_answer_id = $1)
-> Index Scan using
sp_sharing_group_provider_tree_sharing_group_id_key on
sp_sharing_group_provider_tree t (cost=0.00..4.05 rows=1 width=4) (actual
time=0.003..0.003 rows=0 loops=22)
Index Cond: ((t.sharing_group_id =
v.sharing_group_id) AND (t.provider_id = 0))
Total runtime: 91.263 ms
(28 rows)

I'm no backend guru, so I was hoping someone could explain what the original
query-plan was doing. If all you need to know is if a row exists, why loop
over all 3M rows? It seems very simplistic to assume the a LIMIT 1 clause
on the end of all EXISTS subqueries would be a general case optimization...
Right?

--
"An eye for eye only ends up making the whole world blind." -- Mohandas
Gandhi

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2009-10-02 17:56:59 Re: CREATE OR REPLACE FUNCTION vs ownership
Previous Message David E. Wheeler 2009-10-02 17:34:29 Re: latest hstore patch