1) OS/Configuration64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPUpostgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit)work_mem 2GBshared_buffers = 22) Datasetname,pages,tuples,pg_size_pretty"pivotbad";1870;93496;"15 MB""pivotgood";5025;251212;"39 MB"
3) EXPLAIN (ANALYZE ON, BUFFERS ON)"Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=25814.222..32296.765 rows=3163 loops=1)"" Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text))"" Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"" Buffers: shared hit=384 read=6511, temp read=6444 written=6318"" -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.069..37.143 rows=93496 loops=1)"" Buffers: shared hit=192 read=1678"" -> Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=24621.752..24621.752 rows=251212 loops=1)"" Buckets: 1024 Batches: 64 Memory Usage: 650kB"" Buffers: shared hit=192 read=4833, temp written=4524"" -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..117.780 rows=251212 loops=1)"" Buffers: shared hit=192 read=4833""Total runtime: 32297.305 ms"
4) INDEXESI can certainly add an index but given the table sizes I am not sure if that is a factor. This by no means is a large dataset less than 350,000 rows in total and 3 columns. Also this was just a quick dump of data for comparison purpose. When I saw the poor performance on the COALESCE, I pointed the data load to SQL Server and ran the same query except with the TSQL specific ISNULL function.