Removing Functionally Dependent GROUP BY Columns

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Removing Functionally Dependent GROUP BY Columns
Date: 2015-12-01 04:00:19
Message-ID: CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We already allow a SELECT's target list to contain non-aggregated columns
in a GROUP BY query in cases where the non-aggregated column is
functionally dependent on the GROUP BY clause.

For example a query such as;

SELECT p.product_id,p.description, SUM(s.quantity)
FROM product p
INNER JOIN sale s ON p.product_id = s.product_id
GROUP BY p.product_id;

is perfectly fine in PostgreSQL, as p.description is functionally dependent
on p.product_id (assuming product_id is the PRIMARY KEY of product).

It seems that there's no shortage of relational databases in existence
today which don't support this. These databases would require the GROUP BY
clause to include the p.description column too.

It seems rather unfortunate that people who migrate applications to
PostgreSQL may not be aware that we support this, as currently if we
needlessly include the p.description column, PostgreSQL naively includes
this column while grouping. These people could well be incurring a
performance penalty due to our planner not removing the useless items from
the list, as if the primary key is present, then including any other
columns won't cause splitting of the groups any further, all other columns
from the *same relation* can simply be removed from the GROUP BY clause.

There are in fact also two queries in TPC-H (Q10 and Q18) which are written
to include all of the non-aggregated column in the GROUP BY list. During a
recent test I witnessed a 50% gain in performance in Q10 by removing the
unneeded columns from the GROUP BY clause.

I've attached a patch which implements this in PostgreSQL.

The patch may need a little more work in order to adjust the targetlist's
tleSortGroupRefs to remove invalid ones and perhaps also remove the gaps.

I'm posting this now so that I can gauge the community interest in this.

Is it something that we'd like to have in PostgreSQL?

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
prune_group_by_clause_2027f512_2015-12-01.patch application/octet-stream 11.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2015-12-01 04:09:29 Re: Removing Functionally Dependent GROUP BY Columns
Previous Message Michael Paquier 2015-12-01 02:53:35 Re: Error with index on unlogged table