NULL-handling in aggregate(DISTINCT ...)

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: NULL-handling in aggregate(DISTINCT ...)
Date: 2009-11-12 01:27:57
Message-ID: 873a4kv8rn.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Quoth the comments in nodeAgg.c:

* We don't currently implement DISTINCT aggs for aggs having more
* than one argument. This isn't required for anything in the SQL
* spec, but really it ought to be implemented for
* feature-completeness. FIXME someday.

and:

* DISTINCT always suppresses nulls, per SQL spec, regardless of the
* transition function's strictness.

(What the SQL spec actually says is that aggregate calls which are
<general set operation> ignore all nulls regardless of whether they
are ALL or DISTINCT. Other kinds of aggregates are not permitted by
the spec to use ALL or DISTINCT.)

Currently we have this behaviour:

postgres=# select array_agg(all a) from (values (1),(null)) v(a);
array_agg
-----------
{1,NULL}
(1 row)

postgres=# select array_agg(distinct a) from (values (1),(null)) v(a);
array_agg
-----------
{1}
(1 row)

which personally I feel is somewhat wrong, since 1 and NULL are in
fact distinct, but which is due to the logic expressed in the second
comment above. (The spec does not allow array_agg(distinct a) so it
is no help here.)

Now the question: If the limit of one argument for DISTINCT aggs were
removed (which I'm considering doing as part of an update to the
aggregate ORDER BY patch I posted a while back), what should be the
behaviour of agg(distinct x,y) where one or both of x or y is null?
And should it depend on the strictness of the transition function?

--
Andrew (irc:RhodiumToad)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-11-12 01:40:14 Re: NULL-handling in aggregate(DISTINCT ...)
Previous Message Josh Berkus 2009-11-12 01:18:49 Re: Partitioning option for COPY