Re: Aggregate ORDER BY patch

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane), pgsql-hackers(at)postgresql(dot)org, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: Aggregate ORDER BY patch
Date: 2009-12-15 20:31:08
Message-ID: 87r5qwrood.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>> Updated version of the aggregate order by patch.

Tom> Applied with some editorialization. The main change I made was
Tom> to get rid of all the ad-hoc DISTINCT handling in parse_agg.c
Tom> and use transformDistinctClause() instead.

I'll review that; I avoided that code intentionally because the
semantics of query-level DISTINCT are different enough.

Tom> This exposed what I believe to be a bug in the submitted patch:
Tom> it accepted cases like

Tom> agg(DISTINCT x ORDER BY x,y)

This is not a bug, it was done intentionally (as you might have
guessed from the fact that there was a regression test for it). The
additional ORDER BY column in this case is always safe (since DISTINCT
adopts the equality operator from the sort, it's not possible for
additional sort columns to break the DISTINCT). I allowed the case
since there was therefore no good reason to forbid it.

There is at least one case where this makes a visible difference in
query output: if the aggregate can distinguish values of x which are
considered equal by the sort operator used, then the value of y
affects which value of x is seen. It is probably relatively easy to
generate examples of this using the box type and array_agg.

Tom> We do not allow that in ordinary query-level DISTINCT

Note that ordinary query-level DISTINCT has the reverse semantics; the
DISTINCT operation is (per spec) logically prior to the order by, the
fact that they are planned in the reverse order is an implementation
detail.

Query-level DISTINCT shouldn't allow columns in the order by that
aren't in the select list because those columns _do not exist_ at the
point that ordering logically takes place (even though in the
implementation, they might).

This isn't the case for aggregate order by.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message decibel 2009-12-15 20:31:19 Re: Need a mentor, and a project.
Previous Message Tom Lane 2009-12-15 20:20:39 Re: Closing out CommitFest 2009-11