15th April 2019: PostgreSQL Code of Conduct Committee 2018 Annual Report

Development Versions:
devel

PostgreSQL 9.3.25 Documentation | ||||
---|---|---|---|---|

Prev | Up | Chapter 9. Functions and Operators | Next |

*Aggregate functions* compute a single
result from a set of input values. The built-in aggregate functions
are listed in Table 9-47
and Table
9-48. The special syntax considerations for aggregate functions
are explained in Section 4.2.7. Consult
Section 2.7 for additional
introductory information.

**Table 9-47. General-Purpose Aggregate Functions**

Function | Argument Type(s) | Return Type | Description |
---|---|---|---|

`array_agg(` |
any | array of the argument type | input values, including nulls, concatenated into an array |

`avg(` |
smallint, int,
bigint, real,
double precision, numeric, or interval |
numeric for any integer-type argument,
double precision for a floating-point
argument, otherwise the same as the argument data type |
the average (arithmetic mean) of all input values |

`bit_and(` |
smallint, int,
bigint, or bit |
same as argument data type | the bitwise AND of all non-null input values, or null if none |

`bit_or(` |
smallint, int,
bigint, or bit |
same as argument data type | the bitwise OR of all non-null input values, or null if none |

`bool_and(` |
bool |
bool |
true if all input values are true, otherwise false |

`bool_or(` |
bool |
bool |
true if at least one input value is true, otherwise false |

`count(*)` |
bigint |
number of input rows | |

`count(` |
any | bigint |
number of input rows for which the value of expression is not null |

`every(` |
bool |
bool |
equivalent to `bool_and` |

`json_agg(` |
any |
json |
aggregates values as a JSON array |

`max(` |
any array, numeric, string, or date/time type | same as argument type | maximum value of expression
across all input values |

`min(` |
any array, numeric, string, or date/time type | same as argument type | minimum value of expression
across all input values |

`string_agg(` |
(text, text) or
(bytea, bytea) |
same as argument types | input values concatenated into a string, separated by delimiter |

`sum(` |
smallint, int,
bigint, real,
double precision, numeric, interval, or money |
bigint for smallint
or int arguments, numeric for bigint arguments,
otherwise the same as the argument data type |
sum of expression across all
input values |

`xmlagg(` |
xml |
xml |
concatenation of XML values (see also Section 9.14.1.7) |

It should be noted that except for `count`

, these functions return a null value when
no rows are selected. In particular, `sum`

of no rows returns null, not zero as one
might expect, and `array_agg`

returns
null rather than an empty array when there are no input rows. The
`coalesce`

function can be used to
substitute zero or an empty array for null when necessary.

Note:Boolean aggregates`bool_and`

and`bool_or`

correspond to standard SQL aggregates`every`

and`any`

or`some`

. As for`any`

and`some`

, it seems that there is an ambiguity built into the standard syntax:SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;Here

`ANY`

can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value. Thus the standard name cannot be given to these aggregates.

Note:Users accustomed to working with other SQL database management systems might be disappointed by the performance of the`count`

aggregate when it is applied to the entire table. A query like:SELECT count(*) FROM sometable;will require effort proportional to the size of the table: PostgreSQL will need to scan either the entire table or the entirety of an index which includes all rows in the table.

The aggregate functions `array_agg`

,
`json_agg`

, `string_agg`

, and `xmlagg`

, as well as similar user-defined aggregate
functions, produce meaningfully different result values depending
on the order of the input values. This ordering is unspecified by
default, but can be controlled by writing an `ORDER BY` clause within the aggregate call, as shown
in Section
4.2.7. Alternatively, supplying the input values from a sorted
subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

But this syntax is not allowed in the SQL standard, and is not portable to other database systems.

Table
9-48 shows aggregate functions typically used in statistical
analysis. (These are separated out merely to avoid cluttering the
listing of more-commonly-used aggregates.) Where the description
mentions `N`, it means the number of
input rows for which all the input expressions are non-null. In all
cases, null is returned if the computation is meaningless, for
example when `N` is zero.

**Table 9-48. Aggregate Functions for Statistics**

Function | Argument Type | Return Type | Description |
---|---|---|---|

`corr(` |
double precision |
double precision |
correlation coefficient |

`covar_pop(` |
double precision |
double precision |
population covariance |

`covar_samp(` |
double precision |
double precision |
sample covariance |

`regr_avgx(` |
double precision |
double precision |
average of the independent variable (sum()X)/N |

`regr_avgy(` |
double precision |
double precision |
average of the dependent variable (sum()Y)/N |

`regr_count(` |
double precision |
bigint |
number of input rows in which both expressions are nonnull |

`regr_intercept(` |
double precision |
double precision |
y-intercept of the least-squares-fit linear equation determined
by the (X, Y) pairs |

`regr_r2(` |
double precision |
double precision |
square of the correlation coefficient |

`regr_slope(` |
double precision |
double precision |
slope of the least-squares-fit linear equation determined by
the (X, Y) pairs |

`regr_sxx(` |
double precision |
double precision |
sum( ("sum of
squares" of the independent variable)X^2) -
sum(X)^2/N |

`regr_sxy(` |
double precision |
double precision |
sum(
("sum of products" of independent times
dependent variable)X*Y) -
sum(X) * sum(Y)/N |

`regr_syy(` |
double precision |
double precision |
sum( ("sum of
squares" of the dependent variable)Y^2) -
sum(Y)^2/N |

`stddev(` |
smallint, int,
bigint, real,
double precision, or numeric |
double precision for floating-point
arguments, otherwise numeric |
historical alias for `stddev_samp` |

`stddev_pop(` |
smallint, int,
bigint, real,
double precision, or numeric |
double precision for floating-point
arguments, otherwise numeric |
population standard deviation of the input values |

`stddev_samp(` |
smallint, int,
bigint, real,
double precision, or numeric |
double precision for floating-point
arguments, otherwise numeric |
sample standard deviation of the input values |

`variance` (expression) |
smallint, int,
bigint, real,
double precision, or numeric |
double precision for floating-point
arguments, otherwise numeric |
historical alias for `var_samp` |

`var_pop` (expression) |
smallint, int,
bigint, real,
double precision, or numeric |
double precision for floating-point
arguments, otherwise numeric |
population variance of the input values (square of the population standard deviation) |

`var_samp` (expression) |
smallint, int,
bigint, real,
double precision, or numeric |
double precision for floating-point
arguments, otherwise numeric |
sample variance of the input values (square of the sample standard deviation) |