BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: a(dot)schnabl(at)synedra(dot)com
Subject: BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values
Date: 2021-05-05 13:24:05
Message-ID: 16993-1298e8dc9c41ab97@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16993
Logged by: A Schnabl
Email address: a(dot)schnabl(at)synedra(dot)com
PostgreSQL version: 13.2
Operating system: Windows 10
Description:

Hello,

I can reproduce a query planner problem on a fresh default installation of
postgres-13.2 on my Windows 10 machine. I also saw it occur on on
postgres-12.6 installations on Windows 10 and CentOS 8. I used the following
table setup:

CREATE TABLE data_node (id BIGINT PRIMARY KEY);
CREATE TABLE data_entry (id BIGINT PRIMARY KEY, node_fk BIGINT NOT NULL,
FOREIGN KEY (node_fk) REFERENCES data_node (id));
CREATE INDEX node_ix ON data_entry USING BTREE (node_fk);
INSERT INTO data_node (id) SELECT generate_series(1,10);
INSERT INTO data_entry (id, node_fk) SELECT s, 2 FROM
generate_series(1,10000000) s;
VACUUM ANALYZE data_node;
VACUUM ANALYZE data_entry;

With the following query, I filter all entries of the data_node table (only
a single with the data given here) which are referenced by the data_entry
table:

SELECT * FROM data_node
WHERE EXISTS (
SELECT 1 FROM data_entry WHERE node_fk = data_node.id
);

The query takes a few seconds to complete, although I expect it to be much
faster (as shown below, it can be performed in less than a millisecond) due
to the node_ix index. EXPLAIN ANALYZE shows the following query plan:

Merge Join (cost=179053.41..179054.24 rows=1 width=8) (actual
time=2140.820..2140.822 rows=1 loops=1)
Merge Cond: (data_node.id = data_entry.node_fk)
-> Index Only Scan using data_node_pkey on data_node (cost=0.14..8.29
rows=10 width=8) (actual time=0.006..0.010 rows=3 loops=1)
Heap Fetches: 0
-> Sort (cost=179053.27..179053.28 rows=1 width=8) (actual
time=2140.805..2140.806 rows=1 loops=1)
Sort Key: data_entry.node_fk
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=179053.25..179053.26 rows=1 width=8)
(actual time=2140.796..2140.796 rows=1 loops=1)
Group Key: data_entry.node_fk
Batches: 1 Memory Usage: 24kB
-> Seq Scan on data_entry (cost=0.00..154053.60 rows=9999860
width=8) (actual time=0.036..973.376 rows=10000000 loops=1)
Planning Time: 0.192 ms
Execution Time: 2140.873 ms

Note that a sequential scan scan on data_entry is used here rather than some
index (only) scans on node_ix. If I

SET enable_seqscan = false;

and EXPLAIN ANALYZE the query again, I get the following result:

Nested Loop Semi Join (cost=0.57..12.85 rows=1 width=8) (actual
time=0.877..0.920 rows=1 loops=1)
-> Index Only Scan using data_node_pkey on data_node (cost=0.14..8.29
rows=10 width=8) (actual time=0.006..0.011 rows=10 loops=1)
Heap Fetches: 0
-> Index Only Scan using node_ix on data_entry (cost=0.43..178143.98
rows=9999860 width=8) (actual time=0.089..0.089 rows=0 loops=10)
Index Cond: (node_fk = data_node.id)
Heap Fetches: 0
Planning Time: 0.328 ms
Execution Time: 0.959 ms

Due to the huge performance gap between the plans, I think that the query
planner should use the node_ix index here even without enable_seqscan being
set to false.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-05-05 14:31:10 Re: BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values
Previous Message Tomas Vondra 2021-05-05 13:13:24 Re: ogr_fdw missing from /pub/repos/yum/13/fedora/fedora-34-x86_64/