Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params?

From: Dmitry Koterov <dmitry(dot)koterov(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params?
Date: 2022-11-17 03:40:13
Message-ID: CA+CZih7CB5t-EO-XE5DHaStzbVfFN8CqfyKB1Nkc9JQocVntWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

PG13+. Assume we have two identical queries with no arguments (as a plain
text, e.g. passed to PQexec - NOT to PQexecParams!):

- one with "a=X AND b IN(...)"
- and one with "a=X and b=ANY('{...}')

The question: is it guaranteed that the planner will always choose
identical plans for them (or, at least, the plan for ANY will not match an
existing index worse than the plan with IN)? In my experiments it shows
that the answer is "yes", but I don't know the PG internals to make sure
that it's true in ALL situations.

Assumptions:

- The number of values in IN/ANY is of medium cardinality (say, 10-100
values)
- Again, all those values are static; no parameters are involved; plain
simple SQL as a text
- There is also another column "a" which is compared against a constant
("a" is of 1000x lower cardinality than "b" to make it interesting), and an
index on (a, b)

Example:

create table test(a bigint, b bigint);
create index test_idx on test(a, b);

truncate test;
insert into test(a, b) select round(s/10000), s from generate_series(1,
1000000) s;

# explain analyze select * from test where *a=10 and b in(1,2,3)*;
----------------------------------------------------------------------------
Index Only Scan using test_idx on test (cost=0.42..13.31 rows=1 width=16)
Index Cond: ((a = 10) AND (b = ANY ('{1,2,3}'::bigint[])))

# explain analyze select * from test where *a=10 and b=any('{1,2,3}')*;
----------------------------------------------------------------------------
Index Only Scan using test_idx on test (cost=0.42..13.31 rows=1 width=16)
Index Cond: ((a = 10) AND (b = ANY ('{1,2,3}'::bigint[])))

It shows exactly the same plan here. *Would it always be the same, or it
may be different?* (E.g. my worry is that for IN variant, the planner can
use the statistics against the actual values in that IN parentheses, whilst
when using ANY('{...}'), I can imagine that in some circumstances, it would
ignore the actual values within the literal and instead build a "generic"
plan which may not utilize the index properly; is it the case?)

P.S.
A slightly correlated question was raised on StackOverflow, e.g.
https://stackoverflow.com/questions/34627026/in-vs-any-operator-in-postgresql
- but it wasn't about this particular usecase, they were discussing more
complicated things, like when each element of an IN/ANY clause is actually
a pair of elements (which makes the planner go crazy in some
circumstances). My question is way simpler, I don't deal with clauses like
"IN((1,2),(3,4),(5,6))" etc.; it's all about the plain 2-column selects.
Unfortunately, it's super-hard to find more info about this question,
because both "in" and "any" are stop-words in search engines, so they don't
show good answers, including any of the PG mail list archives.

P.S.S.
Why does the answer matter? Because for "IN(1,2,3)" case, e.g.
pg_stat_statements generalizes the query to "IN($1,$2,$3)" and doesn't
coalesce multiple queries into one, whilst for "=ANY('{1,2,3}')", it
coalesces them all to "=ANY($1)". Having those 1,2,3,... of a different
cardinality all the time, the logs/stats are flooded with the useless
variants of the same query basically. It also applies to e.g. logging to
Datadog which normalizes the queries. We'd love to use "=ANY(...)" variant
everywhere and never use IN() anymore, but are scared of getting some
unexpected regressions.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-11-17 03:48:14 Re: CI and test improvements
Previous Message Justin Pryzby 2022-11-17 03:15:43 Re: pg_upgrade test failure