BUG #17922: ANY vs IN execution plan difference for a single explicit input value

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: kaarel(dot)moppel(at)gmail(dot)com
Subject: BUG #17922: ANY vs IN execution plan difference for a single explicit input value
Date: 2023-05-06 20:56:21
Message-ID: 17922-1e2e0aeedd294424@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17922
Logged by: Kaarel Moppel
Email address: kaarel(dot)moppel(at)gmail(dot)com
PostgreSQL version: 15.2
Operating system: Ubuntu 22.04
Description:

I guess there can be differences how IN vs ANY input is handled for prepared
statements but for plain SQL input I would expect them to produce identical
plans. Seems to be not the case though for a below scenario which we
stumbled on in production where there's a runtime difference of around 5
orders of magnitude as a different leading index is chosen. A repeatable
test case:

CREATE UNLOGGED TABLE a (
node_id int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
project_id int8 NOT NULL
);

INSERT INTO a (project_id)
SELECT
CASE WHEN random() < 0.99 THEN
1
ELSE
2
END
FROM
generate_series(1, 8e6);

CREATE UNIQUE INDEX a_uq ON a (project_id, node_id);

VACUUM ANALYZE a;

CREATE UNLOGGED TABLE b (
node_id int8 NOT NULL UNIQUE
);

INSERT INTO b (node_id)
SELECT
node_id
FROM (
SELECT
node_id
FROM
a
WHERE
node_id > 7e6
LIMIT 15000) x;

VACUUM ANALYZE b;

EXPLAIN
SELECT
a.*
FROM
a
JOIN b USING (node_id)
WHERE
a.project_id = ANY (ARRAY[1]::int8[])
ORDER BY
a.project_id,
a.node_id
LIMIT 10;

EXPLAIN
SELECT
a.*
FROM
a
JOIN b USING (node_id)
WHERE
a.project_id IN (1) -- Also OK with a plain "=1" witout IN
ORDER BY
a.project_id,
a.node_id
LIMIT 10;

QUERY PLAN

───────────────────────────────────────────────────────────────────────────────────────────
Limit (cost=0.72..1789.68 rows=10 width=16)
-> Nested Loop (cost=0.72..2655175.45 rows=14842 width=16)
-> Index Only Scan using a_uq on a (cost=0.43..260452.19
rows=7915872 width=16)
Index Cond: (project_id = ANY ('{1}'::bigint[]))
-> Index Only Scan using b_node_id_key on b (cost=0.29..0.30
rows=1 width=8)
Index Cond: (node_id = a.node_id)
(6 rows)

QUERY PLAN

──────────────────────────────────────────────────────────────────────────────────────────────
Limit (cost=0.72..38.07 rows=10 width=16)
-> Nested Loop (cost=0.72..55440.79 rows=14842 width=16)
-> Index Only Scan using b_node_id_key on b (cost=0.29..397.29
rows=15000 width=8)
-> Index Only Scan using a_uq on a (cost=0.43..3.67 rows=1
width=16)
Index Cond: ((project_id = 1) AND (node_id = b.node_id))
(5 rows)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-05-07 01:47:08 Re: BUG #17922: ANY vs IN execution plan difference for a single explicit input value
Previous Message Nacho Caballero 2023-05-06 17:10:33 Re: Bug in tzdata 2022g