Re: Bad query plan when you add many OR conditions

From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bad query plan when you add many OR conditions
Date: 2020-01-14 08:01:30
Message-ID: 9748c027-a524-2718-28ec-3c6bc74feafb@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Marco Colli schrieb am 10.01.2020 um 02:11:
> I have a query on a large table that is very fast (0s):
> https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-fast_query-txt
>
> Basically the query matches the rows that have a tag1 OR tag2 OR tag3 OR tag4 OR tag5... 
>
> However if you increase the number of OR at some point PostgreSQL makes the bad decision to change its query plan! And the new plan makes the query terribly slow:
> https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-slow_query-txt
>
> Instead of this (which is fast):
>   Bitmap Index Scan on index_subscriptions_on_project_id_and_tags
> It starts using this (which is slow):
>   Parallel Index Scan using index_subscriptions_on_project_id_and_created_at
> The choice seems quite stupid since it doesn't have the tags on the new index... and indeed the query takes about 1 minute instead of a few milliseconds. Here's a list of the available indexes:
> https://gist.github.com/collimarco/039412b4fe0dcf39955888f96eff29db#file-_indexes-txt
>
> How can I encourage PostgreSQL to use the Bitmap Index Scan even when there are many OR conditions? I have tried with VACUUM ANALYZE subscriptions but it doesn't help.
>
> Note: the query is generated dynamically by customers of a SaaS, so I don't have full control on it

Can you replace the many ORs with a single "overlaps" comparison?

This

(tags @> ARRAY['crt:2018_04']::varchar[]) OR (tags @> ARRAY['crt:2018_05']::varchar[]) OR (tags @> ARRAY['crt:2018_06']::varchar[])

is equivalent to

tags && array['crt:2018_04','crt:2018_05','crt:2018_06', ...]

The && operator can make use of a GIN index so maybe that uses the index_subscriptions_on_project_id_and_tags regardless of the number of elements.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Joao Junior 2020-01-14 15:29:51 shared buffers and startup process
Previous Message Shira Bezalel 2020-01-14 00:11:48 Re: Seeking reason behind performance gain in 12 with HashAggregate