Re: Query Performance SQL Server vs. Postgresql

From: Humair Mohammed <humairm(at)hotmail(dot)com>
To: <pavel(dot)stehule(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query Performance SQL Server vs. Postgresql
Date: 2010-11-17 03:53:50
Message-ID: COL115-W3987D9D7DE908C5C47792DA8380@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM.
Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') <> coalesce(pg.response,'MISSING')
Execution Time: 42 seconds
"Hash Join (cost=16212.30..48854.24 rows=93477 width=17)"" Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text))"" Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"" -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134)"" -> Hash (cost=7537.12..7537.12 rows=251212 width=134)"" -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134)"

And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL')<> isnull(pg.response,'ISNULL')
Execution Time: < 1 second
Cost: 1% |--Parallelism(Gather Streams)Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006]<>[Expr1007])) Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL'))) Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL'))) Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))

> From: pavel(dot)stehule(at)gmail(dot)com
> Date: Tue, 16 Nov 2010 08:12:03 +0100
> Subject: Re: [PERFORM]
> To: humairm(at)hotmail(dot)com
> CC: pgsql-performance(at)postgresql(dot)org
>
> 2010/11/15 Humair Mohammed <humairm(at)hotmail(dot)com>:
> > I have 2 tables with a 200,000 rows of data 3 character/string columns ID,
> > Question and Response. The query below compares the data between the 2
> > tables based on ID and Question and if the Response does not match between
> > the left table and the right table it identifies the ID's where there is a
> > mismatch. Running the query in SQL Server 2008 using the ISNULL function
> > take a few milliseconds. Running the same query in Postgresql takes over 70
> > seconds. The 2 queries are below:
> > SQL Server 2008 R2 Query
> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
> > isnull(t2.response,'ISNULL')
>
> > Postgres 9.1 Query
> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
> > coalesce(t2.response,'ISNULL')
> > What gives?
>
> I think, so must problem can be in ugly predicate
> coalesce(t1.response,'ISNULL') <>
> > coalesce(t2.response,'ISNULL')
>
> try use a IS DISTINCT OF operator
>
> ... AND t1.response IS DISTINCT t2.response
>
> Regards
>
> Pavel Stehule
>
> p.s. don't use a coalesce in WHERE clause if it is possible.

In response to

  • Re: at 2010-11-16 07:12:03 from Pavel Stehule

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2010-11-17 04:47:51 Re: Query Performance SQL Server vs. Postgresql
Previous Message Robert Haas 2010-11-17 03:07:49 Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?