array functions - request for opinions (was Re: [PATCHES] array support patch phase 1 patch)

From: Joe Conway <mail(at)joeconway(dot)com>
To: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: array functions - request for opinions (was Re: [PATCHES] array support patch phase 1 patch)
Date: 2003-05-25 18:14:49
Message-ID: 3ED10819.60609@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The discussion below has been taking place on the PATCHES list. I'm
forwarding it to HACKERS to get a wider audience. We're looking for
opinions on (at a minimum):

1) Which of these functions should exist *and* be included in user
documentation
2) Which of these functions are not worth having in the backend at all
3) Specifically WRT array_accum(), is there merit in having a userland
function that aggregates row values into arrays, for use in custom
aggregates.

Note that all of these functions have already been implemented and are
either in CVS (all except str_to_array and array_to_str) or in a
submitted patch. Some are required to implement the SQL 99 standard
syntax, but some are not. Specifically, array_subscript, array_assign,
and singleton_array are not needed to implement other functionality and
would be the primary candidates for removal.

Thanks,

Joe

-------- Original Message --------
Peter Eisentraut wrote:
> Joe Conway writes:
>>I personally don't understand why we should hide them from users. If I
>>prefer to use array_append(var1, var2) rather than (var1 || var2),
>>what's the problem? Its a matter of taste as to which is better.
>
> The problem is that this approach leads to bloat without bound. Maybe
> tomorrow someone prefers append_array(var1, var2) or var1 + var2. The
> standard says it's var1 || var2, there is no technical or substantial
> aesthetical argument against it, so that's what we should use.

I can see your point. But is there any circumstance where the function
will work and the standard syntax won't? Until recently, it was not
possible to assign individual array elements in PL/pgSQL (I think Tom
fixed this). Are there any more examples?

Let's go down the list:

array_accum(anyarray, anyelement)
-- no standard syntax; more on this later
array_append(anyarray, anyelement)
-- can be easily done with "||" operator; is there any case where "||"
won't work that array_append() will? If not, don't document as a
user function.
array_assign(anyarray, integer, anyelement)
-- can be easily done with "var[n] = X" syntax; is there any case
where "var[n] = X" won't work that array_assign() will? If not,
don't document as a user function.
array_cat(anyarray, anyarray)
-- see array_append
array_dims(anyarray)
-- no standard syntax; should be documented.
array_lower(anyarray, integer)
-- no standard syntax; should be documented.
array_prepend(anyelement, anyarray)
-- see array_append
array_subscript(anyarray, integer)
-- can be easily done with "var[n]" syntax; is there any case
where "var[n]" won't work that array_subscript() will? If not,
don't document as a user function.
array_to_str(anyarray, text)
-- no standard syntax; should be documented.
array_upper(anyarray, integer)
-- no standard syntax; should be documented.
singleton_array(anyelement)
-- can be easily done with "array[x]" or "'{x}'" syntax; is there any
case where one of these won't work that singleton_array() will? If
not, don't document as a user function.
str_to_array(text, text)
-- no standard syntax; should be documented.

BTW, should this discussion be on HACKERS or even GENERAL in order to
get a wider audience of opinion?

>>And in any case, array_accum() is intended to be used for building
>>custom aggregates, so that needs to be documented.
> Can you give an example of an aggregate or a class of aggregates where
> this would be useful?
>

There are many discussions on the lists over the past few years in which
people have requested this kind of functionality. There is even a
contrib/intagg that does this for integers only. I don't think there is
any question that there is a demand for the feature. Some examples:

http://fts.postgresql.org/db/msg.html?mid=1021530
http://fts.postgresql.org/db/msg.html?mid=1096592
http://fts.postgresql.org/db/msg.html?mid=1031700
http://fts.postgresql.org/db/msg.html?mid=1353047
http://fts.postgresql.org/db/msg.html?mid=1063738
http://fts.postgresql.org/db/msg.html?mid=1050837
http://fts.postgresql.org/db/msg.html?mid=1066349

Some people would like to simply aggregate rows of data into arrays for
analysis, some want to write custom final functions to post-process the
resulting array (e.g. median). With array_accum() and some of the other
new functions (array_lower & array_upper specifically come to mind),
people have a relatively easy way to create there own custom aggregates
using PL/pgSQL.

With PL/R, I can use array_accum to create an aggregate for just about
any summary statistic that I'm interested in. And while, yes, there is
already an implementation of array_accum in PL/R, it isn't reasonable to
require anyone who wants to use it to install libR also. And, yes, there
are undoubtedly other ways to create those aggregates, but none as
simple to use.

Again, I'd suggest that if we want to debate this much further, we
should move the discussion to the GENERAL and SQL lists so that users
can have a chance to chime in.

Joe

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hans-Jürgen Schönig 2003-05-25 19:37:11 SAP and MySQL ...
Previous Message Rod Taylor 2003-05-25 13:10:15 TODO list