Quick Links

=ANY(ARRAY) vs =ANY(ARRAY(expr)) performance

From:	Ramdip Gill <ramdip(dot)singhgill(at)gmail(dot)com>
To:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	=ANY(ARRAY) vs =ANY(ARRAY(expr)) performance
Date:	2022-11-14 01:49:13
Message-ID:	CAGg-0s_dqmO3guJu9=Nri1hsiEcdAYasAJLW29_r92LUC3MC5Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hello!

I have been struggling with finding a proper solution for this query for
some time and wanted to ask if someone here knows how to approach this?

I have a table named "report" which has an index on report.reporter_id.
This column consists of IDs which are grouped together using a table named
"group_links".
So for every reporter id which is part of the same group, there is a row in
"group_links" with the same group_id.

Now, I noticed that I can select reports for a group in two ways. Both
queries return the same but one is using =ANY(ARRAY(expr)) ("subselect")
and one is using =ANY(ARRAY) ("static array") with the same array as the
expression would return.
The static array query is running very fast for small selections and where
not a lot of rows match the condition. It uses a bitmap index scan.
The subselect is running very slow and uses an index scan. However, it is
particularly slow if not many rows match the condition and thus a lot of
rows are filtered while scanning the index.
I was able to reproduce a similar issue with using `= ANY(VALUES)`
instead of `= ANY(ARRAY)`:

1. fast query using =ANY(ARRAY): https://explain.depesz.com/s/dwP8
2. slow query using =ANY(ARRAY(expr)): https://explain.depesz.com/s/3hGb
3. slow query using =ANY(VALUES): https://explain.depesz.com/s/cYrn

I guess the difference comes from the query planner not being able to know
the exact values for the WHERE condition beforehand. But how should cases
like this be best handled?

Should I denormalize the data such that I have a table with columns
report.id and group_id and report.created such that I can create an index
on (created, group_id)? Then I don't have to do a subselect anymore.

I would be very glad for any help regarding this!

Postgres version: PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
------------------------------------

------------------------------------

> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
> relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
> relname='report';
> relname | relpages | reltuples | relallvisible | relkind | relnatts |
> relhassubclass | reloptions | pg_table_size
>
> ---------+----------+---------------+---------------+---------+----------+----------------+------------+---------------
> report | 2062252 | 8.5893344e+07 | 2062193 | r | 22 |
> f | | 16898801664
> (1 row)

------------------------------------

> \d group_links
> Table "public.group_links"
> Column | Type | Collation | Nullable |
> Default
>
> ------------------+--------------------------+-----------+----------+-------------------
> rule_id | uuid | | not null |
> reporter_id | uuid | | not null |
> group_id | uuid | | not null |
> exclusion | boolean | | | false
> last_update_time | timestamp with time zone | | |
> CURRENT_TIMESTAMP
> Indexes:
> "group_rules_matches_pkey" PRIMARY KEY, btree (rule_id, reporter_id)
> "idx_group_rules_matches_group_id" btree (group_id)
> "idx_group_rules_matches_group_id_reporter_id_exclusion" btree
> (group_id, reporter_id, exclusion)
> "idx_group_rules_matches_reporter_id" btree (reporter_id)
> Foreign-key constraints:
> "group_rules_matches_group_id_foreign" FOREIGN KEY (group_id)
> REFERENCES "group"(id) ON DELETE CASCADE
> "group_rules_matches_reporter_id_foreign" FOREIGN KEY (reporter_id)
> REFERENCES reporter(id)

"group_rules_matches_rule_id_foreign" FOREIGN KEY (rule_id) REFERENCES
> group_rules(id) ON DELETE CASCADE

------------------------------------

Responses

Re: =ANY(ARRAY) vs =ANY(ARRAY(expr)) performance at 2022-11-14 04:17:17 from Ramdip Gill
Re: =ANY(ARRAY) vs =ANY(ARRAY(expr)) performance at 2022-11-14 14:11:18 from Rick Otten

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ramdip Gill	2022-11-14 04:17:17	Re: =ANY(ARRAY) vs =ANY(ARRAY(expr)) performance
Previous Message	Vince McMahon	2022-10-24 14:31:44	Re: Explain returns different number of rows