Re: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: _(at)ericsheng(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan
Date: 2022-07-12 06:34:09
Message-ID: ddd9555b-96d4-ca8d-8b04-f6fec2b90124@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 7/10/22 09:52, PG Bug reporting form wrote:
> All join orders for inner joins give semantically equivalent results, and
> documentation at https://www.postgresql.org/docs/current/explicit-joins.html
> indicates that the planner should explore all join orders unless there are
> too many tables, so I would have expected the join order here to be
> immaterial to the query plan chosen.
>
> Digging a bit deeper, this appears to be due to the USING clause causing the
> GROUP BY group_id to be rewritten to `groups.group_id` in the first query
> and `tasks.group_id` in the second query, and the former resulting in the
> simpler plan not being used. The same difference can be observed running:
>
> EXPLAIN (ANALYZE, TIMING) SELECT groups.group_id, BOOL_AND(finished)
> FROM tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
> groups.group_id, priority ORDER BY priority ASC LIMIT 10;
> (the slow plan)
>
> EXPLAIN (ANALYZE, TIMING) SELECT tasks.group_id, BOOL_AND(finished) FROM
> tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
> tasks.group_id, priority ORDER BY priority ASC LIMIT 10;
> (the fast plan)
Documentation speaks the truth - optimizer checks all join permutations.
But USING clause doesn't pull both group_id vars and implicitly chooses
only one according to an algorithm, described in
parse_clause.c::buildMergedJoinVar(). So, because you use group_id in
upper GROUP BY, optimizer is limited to specific set of strategies,
because it must use the same grouping variable, as implicitly chosen in
the JOIN USING clause.

--
Regards
Andrey Lepikhov
Postgres Professional

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Yura Sokolov 2022-07-12 14:13:28 Re: can't drop table due to reference from orphaned temp function
Previous Message David Rowley 2022-07-12 05:00:02 Re: Reducing chunk header sizes on all memory context types