Re: Final Patch for GROUPING SETS

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-01-06 20:21:56
Message-ID: 87h9w38zhg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Herewith the latest version of the patch.

Stuff previously discussed but NOT changed in this version:

1. Still uses the ChainAggregate mechanism. As mentioned before, I'm
happy to abandon this given a better option, but I've not been given
anything to work with yet.

2. Still has GroupedVar. I have no objection in principle to taking
this out, but the details of doing so depend on the chain-agg vs.
possible Append/CTE approach (or other approach), so I don't want to
make unnecessary work by doing this prematurely and having to re-do
it.

Stuff changed:

1. Lotsa comments.

2. Node functions and declarations are re-ordered for consistency.
Renamed "Grouping" expression node to "GroupingFunc".

3. Memory usage is now constrained so that no more than 2, or in some
cases 3, sort nodes in a chain are active at one time. (I've tested
this by monitoring the memory usage for large cubes). (The case of 3
active sorts is if REWIND was requested and we added a sort node to
the input plan; while we have to re-sort the intermediate sorts on a
rewind if there are more than two of them, we keep the originally
sorted input to avoid rewinding the input plan.)

4. A problem of incorrect size estimation was corrected (thinko).

5. Tested, but provisionally rejected, the approach of preferring to
use Bitmapsets rather than integer lists. While there is a slight code
simplification (offset by greater confusion over whether we're dealing
with lists or bitmaps at any given point), and very minor performance
gain on contrived cases, the big drawback is that the order of clauses
in the query is destroyed even when it is surprising to the user to do
so.

6. CUBE and ROLLUP are unreserved, and ruleutils is modified to
schema-qualify uses of them as plain functions in group by clauses, in
addition to adding extra parens as the previous patch did. Accordingly
there are no changes to contrib/cube or contrib/earthdistance now.
(Upgrading from older versions may require --quote-all-identifiers if
for some bizarre reason cube() appears in a group by clause of a view.)

7. Fixed a bug in handling direct args of ordered-set aggs.

8. Some variable name cleanup and general tidying.

This is now one big patch (per Tom's gripe about the previous one
being split up, even though there were reasons for that).

One possible new issue is that the memory usage constraint now means
that the explain analyze output shows no memory stats for most of the
sort nodes. This is arguably more accurate, since if each node
displayed its actual memory usage it would look like the plan uses
more memory than it actually does; but it's still a bit odd. (It
happens because the preemptive rescan call discards the actual
statistics)

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp-all.patch text/x-patch 269.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-01-06 20:39:29 Re: pg_rewind in contrib
Previous Message Simon Riggs 2015-01-06 20:04:04 Re: parallel mode and parallel contexts