Skip site navigation (1) Skip section navigation (2)

pgsql-server/contrib/tablefunc README.tablefun ...

From: momjian(at)postgresql(dot)org (Bruce Momjian - CVS)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql-server/contrib/tablefunc README.tablefun ...
Date: 2003-03-20 06:46:30
Message-ID: 20030320064630.DA4A3475C15@postgresql.org (view raw or flat)
Thread:
Lists: pgsql-committers
CVSROOT:	/cvsroot
Module name:	pgsql-server
Changes by:	momjian(at)postgresql(dot)org	03/03/20 01:46:30

Modified files:
	contrib/tablefunc: README.tablefunc tablefunc.c tablefunc.h 
	                   tablefunc.sql.in 
	contrib/tablefunc/expected: tablefunc.out 
	contrib/tablefunc/sql: tablefunc.sql 

Log message:
	Attached is an update to contrib/tablefunc. It implements a new hashed
	version of crosstab. This fixes a major deficiency in real-world use of
	the original version. Easiest to undestand with an illustration:
	
	Data:
	-------------------------------------------------------------------
	select * from cth;
	id | rowid |        rowdt        |   attribute    |      val
	----+-------+---------------------+----------------+---------------
	1 | test1 | 2003-03-01 00:00:00 | temperature    | 42
	2 | test1 | 2003-03-01 00:00:00 | test_result    | PASS
	3 | test1 | 2003-03-01 00:00:00 | volts          | 2.6987
	4 | test2 | 2003-03-02 00:00:00 | temperature    | 53
	5 | test2 | 2003-03-02 00:00:00 | test_result    | FAIL
	6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
	7 | test2 | 2003-03-02 00:00:00 | volts          | 3.1234
	(7 rows)
	
	Original crosstab:
	-------------------------------------------------------------------
	SELECT * FROM crosstab(
	'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
	AS c(rowid text, temperature text, test_result text, test_startdate
	text, volts text);
	rowid | temperature | test_result | test_startdate | volts
	-------+-------------+-------------+----------------+--------
	test1 | 42          | PASS        | 2.6987         |
	test2 | 53          | FAIL        | 01 March 2003  | 3.1234
	(2 rows)
	
	Hashed crosstab:
	-------------------------------------------------------------------
	SELECT * FROM crosstab(
	'SELECT rowid, attribute, val FROM cth ORDER BY 1',
	'SELECT DISTINCT attribute FROM cth ORDER BY 1')
	AS c(rowid text, temperature int4, test_result text, test_startdate
	timestamp, volts float8);
	rowid | temperature | test_result |   test_startdate    | volts
	-------+-------------+-------------+---------------------+--------
	test1 |          42 | PASS        |                     | 2.6987
	test2 |          53 | FAIL        | 2003-03-01 00:00:00 | 3.1234
	(2 rows)
	
	Notice that the original crosstab slides data over to the left in the
	result tuple when it encounters missing data. In order to work around
	this you have to be make your source sql do all sorts of contortions
	(cartesian join of distinct rowid with distinct attribute; left join
	that back to the real source data). The new version avoids this by
	building a hash table using a second distinct attribute query.
	
	The new version also allows for "extra" columns (see the README) and
	allows the result columns to be coerced into differing datatypes if they
	are suitable (as shown above).
	
	In testing a "real-world" data set (69 distinct rowid's, 27 distinct
	categories/attributes, multiple missing data points) I saw about a
	5-fold improvement in execution time (from about 2200 ms old, to 440 ms
	new).
	
	I left the original version intact because: 1) BC, 2) it is probably
	slightly faster if you know that you have no missing attributes.
	
	README and regression test adjustments included. If there are no
	objections, please apply.
	
	Joe Conway


pgsql-committers by date

Next:From: Bruce Momjian - CVSDate: 2003-03-20 07:00:57
Subject: pgsql-server/doc TODO
Previous:From: Bruce Momjian - CVSDate: 2003-03-20 06:43:35
Subject: pgsql-server/src/bin/psql command.c common.c c ...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group