DATACUBE operator for PostgreSQL. © Copyright 2003, Srikanth M and Sumit Kumar Mukherjee DATACUBE is an operator which generalizes relational aggregates. Since this is useful in building datawarehouses for OLAP (Online Analytical Processing), we felt the need to implement this in PostgreSQL. Oracle and MySQL already have one. In keeping with this, we first define the syntax of a CUBE statement as follows: SELECT INTO FROM WHERE GROUP BY HAVING WITH CUBE; Of the above, SELECT, FROM and GROUP BY fields are mandatory. Depending on whether the querytree has an INTO structure, we redirect the result of the CUBE query to the required table. Example: -------- Let the fact table be pets: type | store | number | cost --------+--------+--------+------ Turtle | Tampa | 4 | 20 Dog | Tampa | 14 | 20 Cat | Naples | 9 | 17 Dog | Naples | 5 | 20 Turtle | Naples | 1 | 20 Dog | Miami | 12 | 13 Cat | Miami | 18 | 13 The CUBE query can be given as: db=# SELECT type, store, sum(number) FROM pets GROUP BY type, store WITH CUBE; type | store | sum --------+--------+----- Cat | Miami | 18 Cat | Naples | 9 Cat | | 27 Dog | Miami | 12 Dog | Naples | 5 Dog | Tampa | 14 Dog | | 31 Turtle | Naples | 1 Turtle | Tampa | 4 Turtle | | 5 | Miami | 30 | Naples | 15 | Tampa | 18 | | 63 (14 rows) NOTE: ----- a) As of now the CUBE operator works only for queries with a CUBE operator in the outermost level. That is, a query like SELECT type, store, max(number) FROM pets GROUP BY type, store WITH CUBE; is valid. But a query with a nested CUBE operator like in SELECT * FROM pets WHERE (type, store, number) IN (SELECT type, store, avg(number) FROM pets GROUP BY type, store WITH CUBE); is invalid. b) Also, the NULL fields that can be seen in the result actually represent ALL. So one should set null to ALL before running the query to get an exact idea. We would be glad if someone could come up with a modification so that the null display is set to ALL when a CUBE query is called and then reset to normal once execution is done. c) The current algorithm follows a naive method of generating a UNION of SELECT queries to obtain the CUBE. An even more efficient method is currently being devised and we hope to complete it by the next release. The Implementation: ------------------- This section is for those who are interested in the logic behind the implementation of the CUBE operator. Further changes/suggestions are welcome. Just before the execution of the query starts (in postgres.c), we check if the current querytree has the T_CubeSelectStmt nodeTag. If it does, then we invoke the function constructCube defined in src/backend/tcop/datacube.c constructCube is thhe main routine which handles the generation of the DATACUBE. This function takes in as input the querytree, the TopMemoryContext (used while invoking pg_exec_query_string), and a temporary table name generated using gettimeofday(). The main processes which take place in this routine are: a) Retrieve the actual query string from the querytree using the function get_query_def defined in ruleutils.c b) Create a table with the generated name with the result of the actual query WITHOUT the CUBE part. c) Find out the attributes in the targetlist and extract the aggregatelist into aggregateAttrsList. d) Generate the possible combinations of attributes possible using the function generateCombinations(). e) Using these generated combinations, construct the actual query to be run on the table generated in (b). f) Execute this query. If the original query had an 'into', save the result of the CUBE query to that particular table. Else display the result to the user. For further details, please have a look at the source code in datacube.c