BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: _(at)ericsheng(dot)com
Subject: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan
Date: 2022-07-10 04:52:26
Message-ID: 17544-ebd06b00b8836a04@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17544
Logged by: Eric Sheng
Email address: _(at)ericsheng(dot)com
PostgreSQL version: 14.4
Operating system: Ubuntu 20.04
Description:

It seems like the join order of a simple INNER JOIN ... USING(...) affects
query plan selection even when only joining two tables, when used with GROUP
BY on the join column.

Schema: groups with priority, containing zero or more tasks with a boolean

CREATE TABLE groups(group_id INT PRIMARY KEY NOT NULL, priority INT);
CREATE TABLE tasks(task_id INT PRIMARY KEY NOT NULL, group_id INT NOT
NULL REFERENCES groups(group_id), finished BOOLEAN NOT NULL);
CREATE INDEX groups_priority_index ON groups(priority, group_id);
CREATE INDEX tasks_group_id_index ON tasks(group_id);

Data: 1000 groups (random priority) with 1000 tasks per group

DO $$
BEGIN
FOR i IN 0 .. 999 LOOP
INSERT INTO groups(group_id, priority) VALUES(i, FLOOR(RANDOM() *
2000000000)::INT);
FOR j in 0 .. 999 LOOP
INSERT INTO tasks(task_id, group_id, finished) VALUES (1000 * i
+ j, i, RANDOM() * 100 < 99);
END LOOP;
END LOOP;
END;
$$;

Queries: fetching the lowest priority groups and whether or not all tasks in
them have finished (same query twice, just different join order)

EXPLAIN (ANALYZE, TIMING) SELECT group_id, BOOL_AND(finished) FROM
groups INNER JOIN tasks USING(group_id) GROUP BY group_id, priority ORDER BY
priority ASC LIMIT 10;


QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15771.99..15772.02 rows=10 width=9) (actual
time=293.054..302.919 rows=10 loops=1)
-> Sort (cost=15771.99..15777.64 rows=2260 width=9) (actual
time=293.051..302.914 rows=10 loops=1)
Sort Key: groups.priority
Sort Method: top-N heapsort Memory: 25kB
-> Finalize HashAggregate (cost=15700.56..15723.16 rows=2260
width=9) (actual time=292.744..302.777 rows=1000 loops=1)
Group Key: groups.group_id
Batches: 1 Memory Usage: 241kB
-> Gather (cost=15203.36..15677.96 rows=4520 width=9)
(actual time=291.308..301.725 rows=2836 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial HashAggregate (cost=14203.36..14225.96
rows=2260 width=9) (actual time=263.344..263.560 rows=945 loops=3)
Group Key: groups.group_id
Batches: 1 Memory Usage: 241kB
Worker 0: Batches: 1 Memory Usage: 241kB
Worker 1: Batches: 1 Memory Usage: 241kB
-> Hash Join (cost=60.85..11725.61
rows=495550 width=9) (actual time=0.719..182.977 rows=333333 loops=3)
Hash Cond: (tasks.group_id =
groups.group_id)
-> Parallel Seq Scan on tasks
(cost=0.00..10361.50 rows=495550 width=5) (actual time=0.044..61.366
rows=333333 loops=3)
-> Hash (cost=32.60..32.60 rows=2260
width=8) (actual time=0.627..0.627 rows=1000 loops=3)
Buckets: 4096 Batches: 1 Memory
Usage: 72kB
-> Seq Scan on groups
(cost=0.00..32.60 rows=2260 width=8) (actual time=0.042..0.298 rows=1000
loops=3)
Planning Time: 0.310 ms
Execution Time: 303.294 ms

EXPLAIN (ANALYZE, TIMING) SELECT group_id, BOOL_AND(finished) FROM tasks
INNER JOIN groups USING(group_id) GROUP BY group_id, priority ORDER BY
priority ASC LIMIT 10;


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..1.35 rows=10 width=9) (actual time=1.146..10.606
rows=10 loops=1)
-> GroupAggregate (cost=0.70..64932.27 rows=1000000 width=9)
(actual time=1.144..10.599 rows=10 loops=1)
Group Key: groups.priority, tasks.group_id
-> Nested Loop (cost=0.70..47432.27 rows=1000000 width=9)
(actual time=0.055..8.028 rows=10001 loops=1)
-> Index Only Scan using groups_priority_index on groups
(cost=0.28..43.27 rows=1000 width=8) (actual time=0.028..0.036 rows=11
loops=1)
Heap Fetches: 0
-> Index Scan using tasks_group_id_index on tasks
(cost=0.42..37.39 rows=1000 width=5) (actual time=0.017..0.433 rows=909
loops=11)
Index Cond: (group_id = groups.group_id)
Planning Time: 0.878 ms
Execution Time: 10.672 ms

All join orders for inner joins give semantically equivalent results, and
documentation at https://www.postgresql.org/docs/current/explicit-joins.html
indicates that the planner should explore all join orders unless there are
too many tables, so I would have expected the join order here to be
immaterial to the query plan chosen.

Digging a bit deeper, this appears to be due to the USING clause causing the
GROUP BY group_id to be rewritten to `groups.group_id` in the first query
and `tasks.group_id` in the second query, and the former resulting in the
simpler plan not being used. The same difference can be observed running:

EXPLAIN (ANALYZE, TIMING) SELECT groups.group_id, BOOL_AND(finished)
FROM tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
groups.group_id, priority ORDER BY priority ASC LIMIT 10;
(the slow plan)

EXPLAIN (ANALYZE, TIMING) SELECT tasks.group_id, BOOL_AND(finished) FROM
tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
tasks.group_id, priority ORDER BY priority ASC LIMIT 10;
(the fast plan)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jiří Ledvinka 2022-07-11 08:05:53 RE: server closed the connection unexpectedly - bug report
Previous Message PG Bug reporting form 2022-07-09 16:08:02 BUG #17543: CSVLOG malformed from disk space error