Re: Final Patch for GROUPING SETS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-22 15:46:16
Message-ID: 19548.1419263176@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Sat, Dec 13, 2014 at 04:37:48AM +0000, Andrew Gierth wrote:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Tom> That seems pretty grotty from a performance+memory consumption
>> Tom> standpoint. At peak memory usage, each one of the Sort nodes
>> Tom> will contain every input row,

>> Has this objection ever been raised for WindowAgg, which has the same
>> issue?

> I caution against using window function performance as the template for
> GROUPING SETS performance goals. The benefit of GROUPING SETS compared to its
> UNION ALL functional equivalent is 15% syntactic pleasantness, 85% performance
> opportunities. Contrast that having window functions is great even with naive
> performance, because they enable tasks that are otherwise too hard in SQL.

The other reason that's a bad comparison is that I've not seen many
queries that use more than a couple of window frames, whereas we have
to expect that the number of grouping sets in typical queries will be
significantly more than "a couple". So we do have to think about what
the performance will be like with a lot of sort steps. I'm also worried
that this use-case may finally force us to do something about the "one
work_mem per sort node" behavior, unless we can hack things so that only
one or two sorts reach max memory consumption concurrently.

I still find the ChainAggregate approach too ugly at a system structural
level to accept, regardless of Noah's argument about number of I/O cycles
consumed. We'll be paying for that in complexity and bugs into the
indefinite future, and I wonder if it isn't going to foreclose some other
"performance opportunities" as well.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-22 15:46:58 Re: btree_gin and ranges
Previous Message Robert Haas 2014-12-22 15:44:50 Re: Moving src/backend/utils/misc/rbtree.c to src/backend/lib