Re: WITHIN GROUP patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WITHIN GROUP patch
Date: 2013-12-06 14:34:13
Message-ID: 9164.1386340453@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> I believe that the spec requires that the "direct" arguments of
> Tom> an inverse or hypothetical-set aggregate must not contain any
> Tom> Vars of the current query level.

> In any event, going by the docs on the web, Oracle does not forbid
> grouped columns there (their wording is "This expr must be constant
> within each aggregation group.") MSSQL seems to require a literal
> constant, but that's obviously not per spec. IBM doesn't seem to
> have it in db2 for linux, but some of their other products have it
> and include examples of using grouped vars: see
> http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

OK, that's reasonably convincing. I think we'll need a HINT or something
to clarify the error message, because it sure looks like those arguments
are "used in an aggregate function".

> Tom> 2. For an ordered set function, n must equal aggnfixedargs. We
> Tom> treat all n fixed arguments as contributing to the aggregate's
> Tom> result collation, but ignore the sort arguments.

> That doesn't work for getting a sensible collation out of
> percentile_disc applied to a collatable type. (Which admittedly is an
> extension to the spec, which allows only numeric and interval, but it
> seems to me to be worth having.)

Meh. I don't think you can have that and also have the behavior that
multiple ORDER BY items aren't constrained to have the same collation;
at least not without some rule that amounts to a special case for
percentile_disc, which I'd resist.

> Tom> 3. For a hypothetical-set function, n must equal aggnfixedargs
> Tom> plus k, and we match up type and collation info of the last k
> Tom> fixed arguments with the corresponding sort arguments. The
> Tom> first n-k fixed arguments contribute to the aggregate's result
> Tom> collation, the rest not.

> The submitted patch does essentially this but taking the number of
> non-variadic args in place of the suggested aggnfixedargs. Presumably
> in your version the latter would be derived from the former?

I'm not on board with using variadic vs non variadic to determine this.
For example, imagine a hypothetical-set function that for some reason
supports only a single sort column; there would be no reason to use
VARIADIC in its definition, and indeed good reason not to. In any
case, I don't think this behavior should be tied to implementation details
of the representation of the function signature, and IMV variadic is
just that --- particularly for VARIADIC ANY, which is nothing more than
a short-cut for overloading the function name with different numbers of
ANY arguments. Once we've got a match that involves N direct arguments
and K ordering arguments, the behavior should be determinate without
respect to just how we got that match.

> Tom> Reading back over this email, I see I've gone back and forth
> Tom> between using the terms "direct args" and "fixed args" for the
> Tom> evaluate-once stuff to the left of WITHIN GROUP. I guess I'm
> Tom> not really sold on either terminology, but we need something we
> Tom> can use consistently in the code and docs. The spec is no help,
> Tom> it has no generic term at all for these args. Does anybody else
> Tom> have a preference, or maybe another suggestion entirely?

> We (Atri and I) have been using "direct args", but personally I'm not
> amazingly happy with it. Documentation for other dbs tends to just call
> them "arguments", and refer to the WITHIN GROUP expressions as "ordering
> expressions" or similar.

Well, given that I was mistaken to think there could be no Vars at all
in them, "fixed" may not be le mot juste. Unless somebody's got an
alternative to "direct", let's go with that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-12-06 14:43:54 Re: Trust intermediate CA for client certificates
Previous Message Kevin Grittner 2013-12-06 14:33:12 Re: Reference to parent query from ANY sublink