Re: Removing Functionally Dependent GROUP BY Columns

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc: Marko Tiikkaja <marko(at)joh(dot)to>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing Functionally Dependent GROUP BY Columns
Date: 2016-01-14 00:30:31
Message-ID: CAKJS1f_WgEuiazFEoXpTB_vTDZ_FHKrPxwZ0BiQdMwQ_4kmYmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Many thanks for the thorough review!

On 12 January 2016 at 23:37, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
wrote:

>
> In identify_key_vars()
>
> + /* Only PK constraints are of interest for now, see comment above
> */
> + if (con->contype != CONSTRAINT_PRIMARY)
> + continue;
>
> I think the comment should better refer to
> remove_useless_groupby_columns() (don't optimize further than what
> check_functional_grouping() does).
>
>
I've improved this by explaining it more clearly.

>
> + /*
> + * Exit if the constraint is deferrable, there's no point in
> looking
> + * for another PK constriant, as there can only be one.
> + */
>
> small typo on "constriant"
>
>
Fixed.

>
>
> + {
> + varlistmatches = bms_add_member(varlistmatches,
> varlistidx);
> + found_col = true;
> + /* don't break, there might be dupicates */
> + }
>
> small typo on "dupicates". Also, I thought transformGroupClauseExpr()
> called in the parse analysis make sure that there isn't any duplicate in
> the GROUP BY clause. Do I miss something?
>
>
No I missed something :) You are right. I've put a break; here and a
comment to explain why.

>
> in remove_useless_groupby_columns()
>
> + /*
> + * Skip if there were no Vars found in the GROUP BY clause which
> belong
> + * to this relation. We can also skip if there's only 1 Var, as
> there
> + * needs to be more than one Var for there to be any columns
> which are
> + * functionally dependent on another column.
> + */
>
>
> This comment doesn't sound very clear. I'm not a native english speaker,
> so maybe it's just me.
>
>
Yes it was badly worded. I've fixed in the attached.

> + /*
> + * If we found any surplus Vars in the GROUP BY clause, then we'll
> build
> + * a new GROUP BY clause without these surplus Vars.
> + */
> + if (anysurplus)
> + {
> + List *new_groupby = NIL;
> +
> + foreach (lc, root->parse->groupClause)
> + {
> + SortGroupClause *sgc = (SortGroupClause *) lfirst(lc);
> + TargetEntry *tle;
> + Var *var;
> +
> + tle = get_sortgroupclause_tle(sgc, root->parse->targetList);
> + var = (Var *) tle->expr;
> +
> + if (!IsA(var, Var))
> + continue;
> + [...]
>
> if var isn't a Var, it needs to be added in new_groupby.
>
>
Yes you're right, well, at least I've written enough code to ensure that
it's not needed.
Technically we could look inside non-Vars and check if the Expr is just
made up of Vars from a single relation, and then we could also make that
surplus if we find other Vars which make up the table's primary key. I
didn't make these changes as I thought it was a less likely scenario. It
wouldn't be too much extra code to add however. I've went and added an XXX
comment to explain that there might be future optimisation opportunities in
the future around this.

I've attached an updated patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
prune_group_by_clause_90b5f3c_2016-01-14.patch application/octet-stream 12.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2016-01-14 01:08:36 Re: Truncation of identifiers
Previous Message Tom Lane 2016-01-14 00:05:30 Re: Truncation of identifiers