Re: Removing Functionally Dependent GROUP BY Columns

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing Functionally Dependent GROUP BY Columns
Date: 2016-01-22 23:44:06
Message-ID: 17423.1453506246@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> [ prune_group_by_clause_59f15cf_2016-01-15.patch ]

I spent some time looking through this but decided that it needs some work
to be committable.

The main thing I didn't like was that identify_key_vars seems to have a
rather poorly chosen API: it mixes up identifying a rel's pkey with
sorting through the GROUP BY list. I think it would be better to create
a function that, given a relid, hands back the OID of the pkey constraint
and a Bitmapset of the column numbers in the pkey (or 0/NULL if there's
no pkey or it's deferrable). The column numbers should be offset by
FirstLowInvalidHeapAttributeNumber so that a pkey on OID can be
represented correctly --- compare RelationGetIndexAttrBitmap().

The reason this seems like a more attractive solution is that the output
of the pg_constraint lookup becomes independent of the particular query
and thus is potentially cacheable (in the relcache, say). I do not think
we need to cache it right now but I'd like to have the option to do so.

(I wonder BTW whether check_functional_grouping couldn't be refactored
to use the output of such a function, too.)

Some lesser points:

* I did not like where you plugged in the call to
remove_useless_groupby_columns; there are weird interactions with grouping
sets as to whether it will get called or not, and also that whole chunk
of code is due for refactoring. I'd be inclined to call it from
subquery_planner instead, maybe near where the HAVING clause preprocessing
happens.

* You've got it iterating over every RTE, even those that aren't
RTE_RELATION RTEs. It's pure luck that identify_key_vars doesn't fail
outright when passed a bogus relid, and it's certainly wasteful.

* Both of the loops iterating over the groupClause neglect to check
varlevelsup, thus leading to assert failures or worse if an outer-level
Var is present in the GROUP BY list. (I'm pretty sure outer Vars can
still be present at this point, though I might be wrong.)

* I'd be inclined to see if the relvarlists couldn't be converted into
bitmapsets too. Then the test to see if the pkey is a subset of the
grouping columns becomes a simple and cheap bms_is_subset test. You don't
really need surplusvars either, because you can instead test Vars for
membership in the pkey bitmapsets that pg_constraint.c will be handing
back.

* What you did to join.sql/join.out seems a bit bizarre. The existing
test case that you modified was meant to test something else, and
conflating this behavior with the pre-existing one (and not documenting
it) is confusing as can be. A bit more work on regression test cases
seems indicated.

I'm going to set this back to "Waiting on Author". I think the basic
idea is good but it needs a rewrite.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-01-23 00:53:05 Re: Combining Aggregates
Previous Message Alvaro Herrera 2016-01-22 23:09:36 Re: pg_dump fails on domain constraint comments