Re: Order of columns in GROUP BY is significant to the planner.

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Order of columns in GROUP BY is significant to the planner.
Date: 2017-12-21 10:13:24
Message-ID: CAKJS1f-ic8+DVSt1GJCG1njkbS5ui4d4-tmmkhSt0Aq2q+B_uQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 21 December 2017 at 19:16, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> EXPLAIN (BUFFERS, ANALYZE, timing off) SELECT SUM(col5), col2, col1 FROM foo
> WHERE col3 = '4' AND col4 <= '0.9' GROUP BY col2,col1;
>
> EXPLAIN (BUFFERS, ANALYZE, timing off) SELECT SUM(col5), col2, col1 FROM foo
> WHERE col3 = '4' AND col4 <= '0.9' GROUP BY col1,col2;
>
> The first one inserts a sort node on col1,col2 before doing the Group
> Aggregate. The second one uses the ordering of the tuples derived from the
> index scan to do the Group Aggregate directly. Isn't it surprising that the
> order of the columns in the GROUP BY has to be same as the order in the
> index definition in order to make maximal use of the index? Is that a bug?

Not a bug, just the number of combinations to try could end up growing
very large and then you'd likely want or need to re-perform the join
search with each order and keep the cheapest one. Likely it would just
be too slow, especially when there are many tables in the join search
and many columns in the GROUP BY.

There's a comment at the top of preprocess_groupclause() that explains
that we don't do it, it just does not explain why we don't. It really
just mentions that hash agg is probably better in most cases, which
seems like a bit of a cop-out. It likely should just explain that we
err on the side of caution as the planning effort may often outweigh
the benefits we get during execution.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2017-12-21 14:02:15 BUG #14988: application server couldnot contacted
Previous Message Jeff Janes 2017-12-21 06:16:02 Order of columns in GROUP BY is significant to the planner.