=ANY(ARRAY) vs =ANY(ARRAY(expr)) performance

From: Ramdip Gill <ramdip(dot)singhgill(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: =ANY(ARRAY) vs =ANY(ARRAY(expr)) performance
Date: 2022-11-14 01:49:13
Message-ID: CAGg-0s_dqmO3guJu9=Nri1hsiEcdAYasAJLW29_r92LUC3MC5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello!

I have been struggling with finding a proper solution for this query for
some time and wanted to ask if someone here knows how to approach this?

I have a table named "report" which has an index on report.reporter_id.
This column consists of IDs which are grouped together using a table named
"group_links".
So for every reporter id which is part of the same group, there is a row in
"group_links" with the same group_id.

Now, I noticed that I can select reports for a group in two ways. Both
queries return the same but one is using =ANY(ARRAY(expr)) ("subselect")
and one is using =ANY(ARRAY) ("static array") with the same array as the
expression would return.
The static array query is running very fast for small selections and where
not a lot of rows match the condition. It uses a bitmap index scan.
The subselect is running very slow and uses an index scan. However, it is
particularly slow if not many rows match the condition and thus a lot of
rows are filtered while scanning the index.
I was able to reproduce a similar issue with using `= ANY(VALUES)`
instead of `= ANY(ARRAY)`:

1. fast query using =ANY(ARRAY): https://explain.depesz.com/s/dwP8
2. slow query using =ANY(ARRAY(expr)): https://explain.depesz.com/s/3hGb
3. slow query using =ANY(VALUES): https://explain.depesz.com/s/cYrn

I guess the difference comes from the query planner not being able to know
the exact values for the WHERE condition beforehand. But how should cases
like this be best handled?

Should I denormalize the data such that I have a table with columns
report.id and group_id and report.created such that I can create an index
on (created, group_id)? Then I don't have to do a subselect anymore.

I would be very glad for any help regarding this!

Postgres version: PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
------------------------------------

> \d report
> Table "public.report"
> Column | Type | Collation | Nullable | Default
> ---------------+--------------------------+-----------+----------+---------
> reporter_id | uuid | | not null |
> parsed | boolean | | |
> id | text | | not null |
> request_id | uuid | | |
> created | timestamp with time zone | | not null | now()
> customer | text | | |
> subject | text | | |
> parser_result | text | | not null |
> parser | text | | |
> event_types | jsonb | | |
> event_count | integer | | |
> account_id | integer | | |
> reviewable | boolean | | not null | false
> reviewed | boolean | | not null | false
> Indexes:
> "PK_99e4d0bea58cba73c57f935a546" PRIMARY KEY, btree (id)
> "idx_report_created_desc_id_asc" btree (created DESC, id)
> "idx_report_created_desc_reporter_id_asc" btree (created DESC,
> reporter_id)
> "idx_report_event_types" gin (event_types)
> "idx_report_parser_gin" gin (parser gin_trgm_ops)
> "idx_report_parser_result_created_desc" btree (parser_result, created
> DESC)
> "idx_report_reporter_id_asc_created_desc" btree (reporter_id, created
> DESC)
> "idx_report_request_id_asc_created_desc" btree (request_id, created
> DESC)
> "idx_report_subject_gin" gin (subject gin_trgm_ops)
> Check constraints:
> "report_parser_result_constraint" CHECK (parser_result = ANY
> (ARRAY['PARSED'::text, 'UNPARSED'::text, 'REJECTED'::text]))
> Foreign-key constraints:
> "FK_5b809608bb38d119333b69f65f9" FOREIGN KEY (request_id) REFERENCES
> request(id)
> "FK_d41df66b60944992386ed47cf2e" FOREIGN KEY (reporter_id) REFERENCES
> reporter(id)
> Referenced by:
> TABLE "event" CONSTRAINT "event_report_id_foreign" FOREIGN KEY
> (report_id) REFERENCES report(id)

------------------------------------

> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
> relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
> relname='report';
> relname | relpages | reltuples | relallvisible | relkind | relnatts |
> relhassubclass | reloptions | pg_table_size
>
> ---------+----------+---------------+---------------+---------+----------+----------------+------------+---------------
> report | 2062252 | 8.5893344e+07 | 2062193 | r | 22 |
> f | | 16898801664
> (1 row)

------------------------------------

> \d group_links
> Table "public.group_links"
> Column | Type | Collation | Nullable |
> Default
>
> ------------------+--------------------------+-----------+----------+-------------------
> rule_id | uuid | | not null |
> reporter_id | uuid | | not null |
> group_id | uuid | | not null |
> exclusion | boolean | | | false
> last_update_time | timestamp with time zone | | |
> CURRENT_TIMESTAMP
> Indexes:
> "group_rules_matches_pkey" PRIMARY KEY, btree (rule_id, reporter_id)
> "idx_group_rules_matches_group_id" btree (group_id)
> "idx_group_rules_matches_group_id_reporter_id_exclusion" btree
> (group_id, reporter_id, exclusion)
> "idx_group_rules_matches_reporter_id" btree (reporter_id)
> Foreign-key constraints:
> "group_rules_matches_group_id_foreign" FOREIGN KEY (group_id)
> REFERENCES "group"(id) ON DELETE CASCADE
> "group_rules_matches_reporter_id_foreign" FOREIGN KEY (reporter_id)
> REFERENCES reporter(id)

"group_rules_matches_rule_id_foreign" FOREIGN KEY (rule_id) REFERENCES
> group_rules(id) ON DELETE CASCADE

------------------------------------

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ramdip Gill 2022-11-14 04:17:17 Re: =ANY(ARRAY) vs =ANY(ARRAY(expr)) performance
Previous Message Vince McMahon 2022-10-24 14:31:44 Re: Explain returns different number of rows