Re: Postgres NOT IN vs NOT EXISTS optimization

From: Jeremy Smith <jeremy(at)musicsmith(dot)net>
To: "Dirschel, Steve" <steve(dot)dirschel(at)thomsonreuters(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Postgres NOT IN vs NOT EXISTS optimization
Date: 2022-06-14 16:06:52
Message-ID: CAM8SmLW2mL0+2wzYRg68288MTnwB7CELOFF656J-DjYZ0AfLjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I think this explains the situation well:

https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN

On Tue, Jun 14, 2022 at 11:59 AM Dirschel, Steve <
steve(dot)dirschel(at)thomsonreuters(dot)com> wrote:

> We are in the process of migrating from Oracle to Postgres and the
> following query does much less work with Oracle vs Postgres.
>
>
>
> explain (analyze, buffers)
>
> select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_,
> favoritegr0_.CHANGED as changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as
> type_dis3_2_,
>
> favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as
> is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION
> as position7_2_,
>
> favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID
> as product_9_2_,
>
> favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE
> as usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
>
> from cf0.FAVORITE_GROUP favoritegr0_
>
> where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
>
> and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
>
> and (favoritegr0_.FAVORITE_GROUP_SID not in
>
> (select favoriteen1_.FAVORITE_GROUP_SID
>
> from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
>
> cross join cf0.CATEGORY_PAGE categorypa2_
>
> where
> favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
>
> and categorypa2_.UNIQUE_NAME='Florida'
>
> and categorypa2_.IS_DELETED=0
>
> and favoriteen1_.IS_DELETED=0))
>
> and favoritegr0_.IS_DELETED=0
>
> and (favoritegr0_.USAGE_TYPE=0 or favoritegr0_.USAGE_TYPE is null)
>
> and favoritegr0_.PRISM_GUID='ia74483420000012ca23eacf87bb0ed56'
>
> order by favoritegr0_.POSITION desc;
>
>
>
> Here is the plan in Postgres. It did 1426 shared block hits. If you look
> at this plan it is not pushing filtering into the NOT IN subquery- it is
> fully resolving that part of the query driving off where UNIQUE_NAME =
> 'Florida'.
>
>
>
>
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort (cost=5198.22..5198.22 rows=1 width=144) (actual time=6.559..6.560
> rows=1 loops=1)
>
> Sort Key: favoritegr0_."position" DESC
>
> Sort Method: quicksort Memory: 25kB
>
> Buffers: shared hit=1426
>
> -> Index Scan using favorite_group_idx01 on favorite_group
> favoritegr0_ (cost=5190.18..5198.21 rows=1 width=144) (actual
> time=6.514..6.515 rows=1 loops=1)
>
> Index Cond: (((prism_guid)::text =
> 'ia74483420000012ca23eacf87bb0ed56'::text) AND (is_deleted = 0))
>
> Filter: ((NOT (hashed SubPlan 1)) AND ((usage_type = 0) OR
> (usage_type IS NULL)) AND ('FORMS.WESTLAW'::text = (product_sid)::text) AND
> ((product_view)::text = 'DefaultProductView'::text))
>
> Buffers: shared hit=1423
>
> SubPlan 1
>
> -> Nested Loop (cost=0.70..5189.90 rows=1 width=33) (actual
> time=6.459..6.459 rows=0 loops=1)
>
> Buffers: shared hit=1417
>
> -> Index Scan using category_page_idx04 on category_page
> categorypa2_ (cost=0.42..5131.71 rows=7 width=33) (actual
> time=0.035..6.138 rows=92 loops=1)
>
> Index Cond: ((unique_name)::text = 'Florida'::text)
>
> Filter: (is_deleted = 0)
>
> Buffers: shared hit=1233
>
> -> Index Scan using favorite_group_member_idx03 on
> favorite_group_member favoriteen1_ (cost=0.28..8.30 rows=1 width=66)
> (actual time=0.003..0.003 rows=0 loops=92)
>
> Index Cond: ((category_page_sid)::text =
> (categorypa2_.category_page_sid)::text)
>
> Filter: (is_deleted = 0)
>
> Buffers: shared hit=184
>
> Planning Time: 1.624 ms
>
> Execution Time: 6.697 ms
>
>
>
> If I compare that to the plan Oracle uses it pushes the
> favoritegr0_.FAVORITE_GROUP_SID predicate into the NOT IN. I'm able to get
> a similar plan with Postgres if I change the NOT IN to a NOT EXISTS:
>
>
>
> explain (analyze, buffers)
>
> select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_,
> favoritegr0_.CHANGED as changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as
> type_dis3_2_,
>
> favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as
> is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION
> as position7_2_,
>
> favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID
> as product_9_2_,
>
> favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE
> as usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
>
> from cf0.FAVORITE_GROUP favoritegr0_
>
> where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
>
> and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
>
> and not exists (
>
> select 'x'
>
> from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
>
> cross join cf0.CATEGORY_PAGE categorypa2_
>
> where
> favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
>
> and categorypa2_.UNIQUE_NAME='Florida'
>
> and categorypa2_.IS_DELETED=0
>
> and favoriteen1_.IS_DELETED=0
>
> and favoritegr0_.FAVORITE_GROUP_SID =
> favoriteen1_.FAVORITE_GROUP_SID)
>
> and favoritegr0_.IS_DELETED=0
>
> and (favoritegr0_.USAGE_TYPE=0 or favoritegr0_.USAGE_TYPE is null)
>
> and favoritegr0_.PRISM_GUID='ia74483420000012ca23eacf87bb0ed56'
>
> order by favoritegr0_.POSITION desc;
>
>
>
> Here you can see the query did 5 shared block hits- much better than the
> plan above. It's pushing the predicate into the NOT EXISTS with a Nested
> Loop Anti Join.
>
>
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort (cost=121.50..121.51 rows=1 width=144) (actual time=0.027..0.028
> rows=1 loops=1)
>
> Sort Key: favoritegr0_."position" DESC
>
> Sort Method: quicksort Memory: 25kB
>
> Buffers: shared hit=5
>
> -> Nested Loop Anti Join (cost=5.11..121.49 rows=1 width=144) (actual
> time=0.021..0.022 rows=1 loops=1)
>
> Buffers: shared hit=5
>
> -> Index Scan using favorite_group_idx01 on favorite_group
> favoritegr0_ (cost=0.28..8.30 rows=1 width=144) (actual time=0.012..0.012
> rows=1 loops=1)
>
> Index Cond: (((prism_guid)::text =
> 'ia74483420000012ca23eacf87bb0ed56'::text) AND (is_deleted = 0))
>
> Filter: (((usage_type = 0) OR (usage_type IS NULL)) AND
> ('FORMS.WESTLAW'::text = (product_sid)::text) AND ((product_view)::text =
> 'DefaultProductView'::text))
>
> Buffers: shared hit=3
>
> -> Nested Loop (cost=4.83..113.18 rows=1 width=33) (actual
> time=0.008..0.009 rows=0 loops=1)
>
> Buffers: shared hit=2
>
> -> Bitmap Heap Scan on favorite_group_member favoriteen1_
> (cost=4.41..56.40 rows=17 width=66) (actual time=0.007..0.008 rows=0
> loops=1)
>
> Recheck Cond:
> ((favoritegr0_.favorite_group_sid)::text = (favorite_group_sid)::text)
>
> Filter: (is_deleted = 0)
>
> Buffers: shared hit=2
>
> -> Bitmap Index Scan on favorite_group_member_idx02
> (cost=0.00..4.41 rows=17 width=0) (actual time=0.003..0.003 rows=0 loops=1)
>
> Index Cond: ((favorite_group_sid)::text =
> (favoritegr0_.favorite_group_sid)::text)
>
> Buffers: shared hit=2
>
> -> Index Scan using category_page_pkey on category_page
> categorypa2_ (cost=0.42..3.30 rows=1 width=33) (never executed)
>
> Index Cond: ((category_page_sid)::text =
> (favoriteen1_.category_page_sid)::text)
>
> Filter: (((unique_name)::text = 'Florida'::text) AND
> (is_deleted = 0))
>
> Planning Time: 0.554 ms
>
> Execution Time: 0.071 ms
>
>
>
> Is Postgres able to drive the query the same way with the NOT IN as the
> NOT EXISTS is doing or is that only available if the query has a NOT
> EXISTS? I don't see an option to push predicate or something like that
> using pg_hint_plan. I'm not sure if there are any optimizer settings that
> may tell Postgres to treat the NOT IN like a NOT EXISTS when optimizing
> this type of query.
>
>
>
> Thanks in advance
>
> Steve
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-06-14 16:09:16 Re: Postgres NOT IN vs NOT EXISTS optimization
Previous Message Dirschel, Steve 2022-06-14 15:58:39 Postgres NOT IN vs NOT EXISTS optimization

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-06-14 16:08:16 Small TAP improvements
Previous Message Dirschel, Steve 2022-06-14 15:58:39 Postgres NOT IN vs NOT EXISTS optimization