Re: WITHIN GROUP patch

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WITHIN GROUP patch
Date: 2013-12-06 08:13:12
Message-ID: 87eh5qh34u.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Further questions about WITHIN GROUP:

Tom> I believe that the spec requires that the "direct" arguments of
Tom> an inverse or hypothetical-set aggregate must not contain any
Tom> Vars of the current query level.

False.

The spec requires that the direct arguments must not contain _ungrouped_
columns (see <set function specification>), but nowhere are grouped
columns forbidden.

Tom> They don't manage to say that in plain English, of course, but
Tom> in the <hypothetical set function> case the behavior is defined
Tom> in terms of this sub-select:

Tom> ( SELECT 0, SK1, ..., SKK
Tom> FROM TNAME UNION ALL
Tom> VALUES (1, VE1, ..., VEK) )

"TNAME" there is not the input table or an alias for it, but rather
the specific subset of rows to which the grouping operation is being
applied (after applying a FILTER if any).

We're in the General Rules here, not the Syntax Rules, so this is
describing _how to compute the result_ rather than a syntactic
transformation of the input.

In any event, going by the docs on the web, Oracle does not forbid
grouped columns there (their wording is "This expr must be constant
within each aggregation group.") MSSQL seems to require a literal
constant, but that's obviously not per spec. IBM doesn't seem to
have it in db2 for linux, but some of their other products have it
and include examples of using grouped vars: see

http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

So I'm going to say that the majority opinion is on my side here.

Tom> So that's all fine, but the patch seems a bit confused about it.

The patch treats vars in the direct args exactly as it would treat
them anywhere else where they must be ungrouped.

[snip a bunch of stuff based on false assumptions]

Tom> What I now think we should do about the added pg_aggregate
Tom> columns is to have:

Tom> aggnfixedargs int number of fixed arguments, excluding any
Tom> hypothetical ones (always 0 for normal aggregates)

Tom> aggkind char 'n' for normal aggregate,
Tom> 'o' for ordered set function,
Tom> 'h' for hypothetical-set function

I don't see any obvious problem with this.

Tom> with the parsing rules that given

Tom> agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort specifications )

Tom> 1. WITHIN GROUP is disallowed for normal aggregates.

(This is what the submitted patch does.)

Tom> 2. For an ordered set function, n must equal aggnfixedargs. We
Tom> treat all n fixed arguments as contributing to the aggregate's
Tom> result collation, but ignore the sort arguments.

That doesn't work for getting a sensible collation out of
percentile_disc applied to a collatable type. (Which admittedly is an
extension to the spec, which allows only numeric and interval, but it
seems to me to be worth having.)

Tom> 3. For a hypothetical-set function, n must equal aggnfixedargs
Tom> plus k, and we match up type and collation info of the last k
Tom> fixed arguments with the corresponding sort arguments. The
Tom> first n-k fixed arguments contribute to the aggregate's result
Tom> collation, the rest not.

The submitted patch does essentially this but taking the number of
non-variadic args in place of the suggested aggnfixedargs. Presumably
in your version the latter would be derived from the former?

Tom> Reading back over this email, I see I've gone back and forth
Tom> between using the terms "direct args" and "fixed args" for the
Tom> evaluate-once stuff to the left of WITHIN GROUP. I guess I'm
Tom> not really sold on either terminology, but we need something we
Tom> can use consistently in the code and docs. The spec is no help,
Tom> it has no generic term at all for these args. Does anybody else
Tom> have a preference, or maybe another suggestion entirely?

We (Atri and I) have been using "direct args", but personally I'm not
amazingly happy with it. Documentation for other dbs tends to just call
them "arguments", and refer to the WITHIN GROUP expressions as "ordering
expressions" or similar.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-12-06 08:49:33 Re: ANALYZE sampling is too good
Previous Message Amit Kapila 2013-12-06 06:59:29 Re: Performance Improvement by reducing WAL for Update Operation