GROUP BY and HAVING clauses

*** syntax.sgml.orig	Sun Jan 21 22:17:17 2001
--- syntax.sgml.modi.5	Mon Jan 22 01:24:10 2001
***************
*** 1467,1475 ****
     </para>
    </sect2>
  
- <!-- This is confusing as heck.  Make it simpler. -->
- 
- <![IGNORE[
  
    <sect2>
     <title>GROUP BY and HAVING clauses</title>	   
--- 1467,1472 ----
***************
*** 1482,1540 ****
      clause.)
     </para>
  
     <para>
!     In standard SQL, the GROUP BY clause takes a list of column names,
!     that specify a subrow, from the derived input table produced by
!     the previous WHERE or FROM clause and partitions the table into
!     groups with duplicate subrows such that within a column of the
!     subrow, no column value is distinct from other column values. The
!     resulting derived input table is a special type of table, called a
!     grouped table, which still contains all columns but only
!     references to columns of the grouped subrow, and group aggregates,
!     derived from any of the columns, may appear in derived column
!     value expressions in the query select list.  When deriving an
!     output table from a query using a grouped input table, each output
!     row is derived from a corresponding group/partition of the grouped
!     table. Aggregates computed in a derived output column are
!     aggregates on the current partition/group of the grouped input
!     table being processed.  Only one output table row results per
!     group/partition of the grouped input table.
     </para>
  
     <para>
!     Postgres has extended the GROUP BY clause to allow some
!     non-standard, but useful behavior. Derived output columns, given
!     names using an AS clause in the query select list, may appear in
!     the GROUP BY clause in combination with, or instead of, the input
!     table column names. Tables may also be grouped by arbitrary
!     expressions. If output table column names appear in the GROUP BY
!     list, then the input table is augmented with additional columns of
!     the output table columns listed in the GROUP BY clause. The value
!     for each row in the additional columns is computed from the value
!     expression that defines the output column in the query select
!     list. The augmented input table is grouped by the column names
!     listed in the GROUP BY clause. The resulting grouped augmented
!     input table is then treated according standard SQL GROUP BY
!     semantics. Only the columns of the unaugmented input table in the
!     grouped subrow (if any), and group aggregates, derived from any of
!     the columns of the unaugmented input table, may be referenced in
!     the value expressions of the derived output columns of the
!     query. Output columns derived with an aggregate expression cannot
!     be named in the GROUP BY clause.
     </para>
  
     <para>
!     A HAVING clause may optionally follow a GROUP BY clause.  The
!     HAVING clause selects or eliminates, depending on which
!     perspective is taken, groups from the grouped table derived in the
!     GROUP BY clause that precedes it.  The search condition is the
!     same type of expression allowed in a WHERE clause and may
!     reference any of the input table column names in the grouped
!     subrow, but may not reference any others or any named output
!     columns.  When the search condition results in TRUE the group is
!     retained, otherwise the group is eliminated.
     </para>
    </sect2>
  
     <sect2>
      <title>ORDER BY and LIMIT clauses</title>
--- 1479,1590 ----
      clause.)
     </para>
  
+ <synopsis>
+ GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
+ </synopsis>
+ 
+    <para>
+     The GROUP BY clause is used to group together rows in a table
+ 	that share the same values in all the columns listed. The order
+ 	in which the columns are listed doesn't matter as it does in
+ 	an ORDER BY clause.  The purpose is to reduce each group of rows
+ 	sharing common values into one group row that is representative
+ 	of all rows in the group.  This is done to eliminate redundancy
+ 	in the output and/or obtain aggregates that apply to these groups.
+ 	</para>
+ 	  
+ 	<para>
+ 	Once a table
+ 	is grouped, columns that are not included in
+ 	the grouping can't be referenced, except in aggregate, since a specific value
+ 	in those columns is ambiguous - which row in the group should it come from?
+ 	The grouped-by columns can be referenced in select list column expressions 
+ 	since they have a known constant-value
+ 	per group.  Aggregate functions on the ungrouped columns
+ 	provide aggregates that span the rows of a group, not
+ 	of the whole table.  For instance, a sum(sales) on a grouped table
+ 	by product code gives the total sales for each product, not the 
+ 	total sales on all products.  The aggregates of the ungrouped
+ 	columns are representative of the group, whereas their individual
+ 	values may not be.
+ 	</para>
+ 
+    <para>
+     In strict SQL, GROUP BY can only group by columns of the source
+ 	table but Postgres extends this to also allow GROUP BY to group
+ 	by target columns in the query select list.  Grouping by
+ 	value expressions instead of simple column names is also allowed.
+    </para>
+ 
+    <para>
+     Example:
+    </para>
+ 
+    <para>
+     <programlisting>
+ SELECT pid, p.name, (sum(s.units) * p.price) AS sales
+ FROM products p LEFT JOIN sales s USING ( pid )
+ GROUP BY pid, p.name, p.price;
+     </programlisting>
+    </para>
+ 
+    <para>
+     In this example, the columns pid, p.name, and p.price
+     must be in the GROUP BY clause since they are
+     referenced in the query select list.  The column
+     s.units doesn't have to be in the GROUP BY list since
+     it is only used in an aggregate, which represents
+     the group of sales of a product.  For each
+     product, a summary row is returned about all sales
+     of the product.
+    </para>
+ 
+    <para>
+     <synopsis>
+ HAVING <replaceable>search_condition</replaceable>
+     </synopsis>
+    </para>
+ 
     <para>
!     If a table has been grouped using a GROUP BY clause, but then
!     only certain groups are of interest, then the HAVING clause
!     can be used, much like a WHERE clause, to select out the
!     groups of interest from a grouped table.  For some queries,
!     Postgres allows a HAVING clause to be used without a GROUP BY
!     and then it acts just like another WHERE clause, but the point
!     in using HAVING that way is not clear. Since HAVING operates
!     on groups, only grouped columns can be listed in the HAVING
!     clause.  If selection based on some ungrouped column is
!     desired, it should be expressed in the WHERE clause.
     </para>
  
     <para>
!     Example:
     </para>
  
     <para>
!     <programlisting>
! SELECT pid    AS "Products",
!        p.name AS "Over 5000",
! (sum(s.units) * (p.price - p.cost))
!        AS "Past Month Profit"
! FROM products p LEFT JOIN sales s USING ( pid )
! WHERE p.date > CURRENT_DATE - INTERVAL '4 week'
! GROUP BY pid, p.name, p.price, p.cost
! HAVING p.price > 5000;
!     </programlisting>
!    </para>
! 
!    <para>
!     In the example above, the WHERE clause is selecting rows by a
!     column that is not grouped, while the HAVING clause
!     is selecting groups with a price greater than 5000.
     </para>
    </sect2>
+ 
+ <!-- This is confusing as heck.  Make it simpler. -->
+ 
+ <![IGNORE[
  
     <sect2>
      <title>ORDER BY and LIMIT clauses</title>