*** syntax.sgml.orig Sun Jan 21 22:17:17 2001 --- syntax.sgml.modi.5 Mon Jan 22 01:24:10 2001 *************** *** 1467,1475 **** - - - GROUP BY and HAVING clauses --- 1467,1472 ---- *************** *** 1482,1540 **** clause.) ! In standard SQL, the GROUP BY clause takes a list of column names, ! that specify a subrow, from the derived input table produced by ! the previous WHERE or FROM clause and partitions the table into ! groups with duplicate subrows such that within a column of the ! subrow, no column value is distinct from other column values. The ! resulting derived input table is a special type of table, called a ! grouped table, which still contains all columns but only ! references to columns of the grouped subrow, and group aggregates, ! derived from any of the columns, may appear in derived column ! value expressions in the query select list. When deriving an ! output table from a query using a grouped input table, each output ! row is derived from a corresponding group/partition of the grouped ! table. Aggregates computed in a derived output column are ! aggregates on the current partition/group of the grouped input ! table being processed. Only one output table row results per ! group/partition of the grouped input table. ! Postgres has extended the GROUP BY clause to allow some ! non-standard, but useful behavior. Derived output columns, given ! names using an AS clause in the query select list, may appear in ! the GROUP BY clause in combination with, or instead of, the input ! table column names. Tables may also be grouped by arbitrary ! expressions. If output table column names appear in the GROUP BY ! list, then the input table is augmented with additional columns of ! the output table columns listed in the GROUP BY clause. The value ! for each row in the additional columns is computed from the value ! expression that defines the output column in the query select ! list. The augmented input table is grouped by the column names ! listed in the GROUP BY clause. The resulting grouped augmented ! input table is then treated according standard SQL GROUP BY ! semantics. Only the columns of the unaugmented input table in the ! grouped subrow (if any), and group aggregates, derived from any of ! the columns of the unaugmented input table, may be referenced in ! the value expressions of the derived output columns of the ! query. Output columns derived with an aggregate expression cannot ! be named in the GROUP BY clause. ! A HAVING clause may optionally follow a GROUP BY clause. The ! HAVING clause selects or eliminates, depending on which ! perspective is taken, groups from the grouped table derived in the ! GROUP BY clause that precedes it. The search condition is the ! same type of expression allowed in a WHERE clause and may ! reference any of the input table column names in the grouped ! subrow, but may not reference any others or any named output ! columns. When the search condition results in TRUE the group is ! retained, otherwise the group is eliminated. ORDER BY and LIMIT clauses --- 1479,1590 ---- clause.) + + GROUP BY grouping_column_reference , grouping_column_reference... + + + + The GROUP BY clause is used to group together rows in a table + that share the same values in all the columns listed. The order + in which the columns are listed doesn't matter as it does in + an ORDER BY clause. The purpose is to reduce each group of rows + sharing common values into one group row that is representative + of all rows in the group. This is done to eliminate redundancy + in the output and/or obtain aggregates that apply to these groups. + + + + Once a table + is grouped, columns that are not included in + the grouping can't be referenced, except in aggregate, since a specific value + in those columns is ambiguous - which row in the group should it come from? + The grouped-by columns can be referenced in select list column expressions + since they have a known constant-value + per group. Aggregate functions on the ungrouped columns + provide aggregates that span the rows of a group, not + of the whole table. For instance, a sum(sales) on a grouped table + by product code gives the total sales for each product, not the + total sales on all products. The aggregates of the ungrouped + columns are representative of the group, whereas their individual + values may not be. + + + + In strict SQL, GROUP BY can only group by columns of the source + table but Postgres extends this to also allow GROUP BY to group + by target columns in the query select list. Grouping by + value expressions instead of simple column names is also allowed. + + + + Example: + + + + + SELECT pid, p.name, (sum(s.units) * p.price) AS sales + FROM products p LEFT JOIN sales s USING ( pid ) + GROUP BY pid, p.name, p.price; + + + + + In this example, the columns pid, p.name, and p.price + must be in the GROUP BY clause since they are + referenced in the query select list. The column + s.units doesn't have to be in the GROUP BY list since + it is only used in an aggregate, which represents + the group of sales of a product. For each + product, a summary row is returned about all sales + of the product. + + + + + HAVING search_condition + + + ! If a table has been grouped using a GROUP BY clause, but then ! only certain groups are of interest, then the HAVING clause ! can be used, much like a WHERE clause, to select out the ! groups of interest from a grouped table. For some queries, ! Postgres allows a HAVING clause to be used without a GROUP BY ! and then it acts just like another WHERE clause, but the point ! in using HAVING that way is not clear. Since HAVING operates ! on groups, only grouped columns can be listed in the HAVING ! clause. If selection based on some ungrouped column is ! desired, it should be expressed in the WHERE clause. ! Example: ! ! SELECT pid AS "Products", ! p.name AS "Over 5000", ! (sum(s.units) * (p.price - p.cost)) ! AS "Past Month Profit" ! FROM products p LEFT JOIN sales s USING ( pid ) ! WHERE p.date > CURRENT_DATE - INTERVAL '4 week' ! GROUP BY pid, p.name, p.price, p.cost ! HAVING p.price > 5000; ! ! ! ! ! In the example above, the WHERE clause is selecting rows by a ! column that is not grouped, while the HAVING clause ! is selecting groups with a price greater than 5000. + + + + ORDER BY and LIMIT clauses