Re: Efficiency of EXISTS?

From: Kenneth Tilton <ktilton(at)mcna(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Efficiency of EXISTS?
Date: 2012-07-23 22:12:54
Message-ID: CAECCA8bduPiZq0tLgFEA8Q-CctTeOJ8QvWSLUv3p7GG8quWfag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 23, 2012 at 2:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton <ktilton(at)mcna(dot)net> wrote:
> > My mental model of the EXISTS clause must be off. This snippet appears at
> > the end of a series of WITH clauses I suspect are irrelevant:
> >
> >> with etc etc ... , cids as
> >> (select distinct c.id from ddr2 c
> >> join claim_entries ce on ce.claim_id = c.id
> >> where (c.assigned_ddr = 879
> >> or exists (select 1 from ddr_cdt dc
> >> where
> >> dc.sys_user_id = 879
> >> and dc.document_type = c.document_type
> >> -- makes it faster: and (dc.cdt_code is null or dc.cdt_code =
> ce.cpt_code)
> >> )))
> >>
> >> select count(*) from cids
> >
> >
> > If I uncomment the bit where it says "make it faster" I get decent
> response
> > and the graphical analyze display shows the expected user+doctype+cdtcode
> > index is being used (and nice thin lines suggesting efficient lookup).
> >
> > As it is, the analyze display shows the expected user+doctype index*
> being
> > used but the lines are fat, and performance is an exponential disaster.
> >
> > * I created the (to me ) redundant user+doctype index trying to get
> Postgres
> > to Do the Right Thing(tm), but I can see that was not the issue.
> >
> > I presume the reason performance drops off a cliff is because there can
> be
> > 9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just
> look
> > to see if there was at least one row matching user+doctype and return its
> > decision. I have tried select *, select 1, and limit 1 on the nested
> select
> > to no avail.
> >
> > Am I just doing something wrong? I am a relative noob. Is there some
> other
> > hint I can give the planner?
>
> hard to say without having the explain analyze output. also it's not
> clear why you need to use WITH, at least for the terminating query.
> I'd just do:
>
> select count(*) from
> (
> inner_query
> )
>

OK. Here is the full query:

with ddr as (
select c.id
,case
when c.portal_user_id is null then u.provider_facility_id
else pu.provider_facility_id
end provider_facility_id
from claims c
left join sys_users u on u.id = c.created_by
left join portal_users pu on pu.id = c.portal_user_id
WHERE c.deleted = 0
AND c.status >= 0
AND (c.created_by is not null or c.portal_user_id is not null)
AND true not in ( select ineligible_code_id in (46,65)
from claim_carcs cc
where c.id = cc.claim_id
and cc.deleted = 0 )
AND (false OR c.document_type = 0)
AND (false OR c.group_plan_id = 44)

limit 1500
)

,ddr2 as (
select c.id
, c.document_type
, c.radiographs
, c.nea_number
, c.assigned_ddr
, d.provider_facility_id as submitting_facility_id
, count(ca.id) as claim_attachments_count
, cast(exists (select 1 from triples where s = c.id and sda='claims' and p
= 'ddr-review-passed-on-by') as boolean) as passedon
from ddr d
inner join
claims c on d.id = c.id
join claim_attachments ca on c.id = ca.claim_id
group by
c.id
, submitting_facility_id
having ((nullif(trim(c.nea_number, ' '),'') is not null)
or case transmission_method
when 'P' then count(distinct ca.id) > 1
else count(distinct ca.id) > 0
end
or c.radiographs > 0))

, cids as
(select distinct c.id from ddr2 c
join claim_entries ce on ce.claim_id = c.id
where (c.assigned_ddr = 879
or exists (select 1 from ddr_cdt dc
where
dc.sys_user_id = 879
and dc.document_type = c.document_type
--and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
)))
select count(*) from cids

And the explain output:

"Aggregate (cost=56060.60..56060.61 rows=1 width=0)"
" CTE ddr"
" -> Limit (cost=306.29..16203.83 rows=1500 width=16)"
" -> Nested Loop Left Join (cost=306.29..7442626.75 rows=702214
width=16)"
" -> Hash Left Join (cost=306.29..7244556.97 rows=702214
width=12)"
" Hash Cond: (c.created_by = u.id)"
" -> Index Scan using claims_lca1 on claims c
(cost=0.00..7230212.96 rows=702214 width=12)"
" Index Cond: ((deleted = 0) AND (status >= 0)
AND (group_plan_id = 44) AND (document_type = 0))"
" Filter: (((created_by IS NOT NULL) OR
(portal_user_id IS NOT NULL)) AND (NOT (SubPlan 1)))"
" SubPlan 1"
" -> Index Scan using claim_carcs_claim_id on
claim_carcs cc (cost=0.00..9.23 rows=1 width=4)"
" Index Cond: (c.id = claim_id)"
" Filter: (deleted = 0)"
" -> Hash (cost=224.46..224.46 rows=6546 width=8)"
" -> Seq Scan on sys_users u
(cost=0.00..224.46 rows=6546 width=8)"
" -> Index Scan using portal_users_pkey on portal_users pu
(cost=0.00..0.27 rows=1 width=8)"
" Index Cond: (id = c.portal_user_id)"
" CTE ddr2"
" -> GroupAggregate (cost=25714.40..28093.98 rows=286 width=27)"
" Filter: ((NULLIF(btrim((c.nea_number)::text, ' '::text),
''::text) IS NOT NULL) OR CASE c.transmission_method WHEN 'P'::bpchar THEN
(count(DISTINCT ca.id) > 1) ELSE (count(DISTINCT ca.id) > 0) END OR
(c.radiographs > 0))"
" -> Sort (cost=25714.40..25715.11 rows=286 width=27)"
" Sort Key: c.id, d.provider_facility_id"
" -> Nested Loop (cost=0.00..25702.73 rows=286 width=27)"
" -> Nested Loop (cost=0.00..12752.74 rows=1500
width=27)"
" -> CTE Scan on ddr d (cost=0.00..30.00
rows=1500 width=8)"
" -> Index Scan using claims_pkey on claims c
(cost=0.00..8.47 rows=1 width=19)"
" Index Cond: (id = d.id)"
" -> Index Scan using claim_attachments_claim on
claim_attachments ca (cost=0.00..8.61 rows=2 width=8)"
" Index Cond: (claim_id = c.id)"
" SubPlan 3"
" -> Index Scan using triples_s_idx on triples
(cost=0.00..8.28 rows=1 width=0)"
" Index Cond: (s = c.id)"
" Filter: ((sda = 'claims'::text) AND (p =
'ddr-review-passed-on-by'::text))"
" SubPlan 4"
" -> Bitmap Heap Scan on triples (cost=102.70..1010.15
rows=823 width=8)"
" Recheck Cond: (p = 'ddr-review-passed-on-by'::text)"
" Filter: (sda = 'claims'::text)"
" -> Bitmap Index Scan on triples_p_idx
(cost=0.00..102.49 rows=3497 width=0)"
" Index Cond: (p = 'ddr-review-passed-on-by'::text)"
" CTE cids"
" -> HashAggregate (cost=11759.51..11760.52 rows=101 width=4)"
" -> Nested Loop (cost=0.00..11722.94 rows=14627 width=4)"
" -> CTE Scan on ddr2 c (cost=0.00..112.75 rows=144
width=4)"
" Filter: ((assigned_ddr = 879) OR (alternatives:
SubPlan 6 or hashed SubPlan 7))"
" SubPlan 6"
" -> Seq Scan on ddr_cdt dc (cost=0.00..134293.58
rows=361282 width=0)"
" Filter: ((sys_user_id = 879) AND
(document_type = c.document_type))"
" SubPlan 7"
" -> Bitmap Heap Scan on ddr_cdt dc
(cost=20292.74..73868.80 rows=1083845 width=4)"
" Recheck Cond: (sys_user_id = 879)"
" -> Bitmap Index Scan on
"ddr-cdt-idx-user-doc" (cost=0.00..20021.78 rows=1083845 width=0)"
" Index Cond: (sys_user_id = 879)"
" -> Index Scan using claim_entries_claim_id on
claim_entries ce (cost=0.00..79.35 rows=102 width=4)"
" Index Cond: (claim_id = c.id)"
" -> CTE Scan on cids (cost=0.00..2.02 rows=101 width=0)"

More interesting: I tried reducing the complex query to a simpler query and
what I saw was that my mental model of EXISTS is fine. :) It was efficient
in the way I expected, and faster than the version that did the last test
(the cdt_code test). Now I just have to find out why it is slower in vivo.

Thx, ken

--
Kenneth Tilton

*Director of Software Development*

*MCNA Dental Plans*
200 West Cypress Creek Road
Suite 500
Fort Lauderdale, FL 33309

954-730-7131 X181 (Office)
954-628-3347 (Fax)
1-800-494-6262 X181 (Toll Free)

ktilton(at)mcna(dot)net <glipari(at)mcna(dot)net> (Email)

www.mcna.net (Website)
CONFIDENTIALITY NOTICE: This electronic mail may contain information that
is privileged, confidential, and/or otherwise protected from disclosure to
anyone other than its intended recipient(s). Any dissemination or use of
this electronic mail or its contents by persons other than the intended
recipient(s) is strictly prohibited. If you have received this
communication in error, please notify the sender immediately by reply
e-mail so that we may correct our internal records. Please then delete the
original message. Thank you.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2012-07-24 00:30:22 Re: Odd blocking (or massively latent) issue - even with EXPLAIN
Previous Message Merlin Moncure 2012-07-23 21:52:33 Re: Efficiency of EXISTS?