Skip site navigation (1) Skip section navigation (2)

Re: Query Performance SQL Server vs. Postgresql

From: Humair Mohammed <humairm(at)hotmail(dot)com>
To: <pavel(dot)stehule(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query Performance SQL Server vs. Postgresql
Date: 2010-11-17 21:50:25
Message-ID: COL115-W186A7B93051E6907958ED6A8380@phx.gbl (view raw or flat)
Thread:
Lists: pgsql-performance
Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun the query. Results from EXPLAIN ANALYZE below:
"Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual time=43200.223..49502.874 rows=3163 loops=1)""  Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text))""  Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text <> (COALESCE(pg.response, 'MISSING'::character varying))::text)""  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.009..48.200 rows=93496 loops=1)""  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1)""        Buckets: 1024  Batches: 64  Memory Usage: 650kB""        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.119..173.019 rows=251212 loops=1)""Total runtime: 49503.450 ms"

> From: pavel(dot)stehule(at)gmail(dot)com
> Date: Wed, 17 Nov 2010 05:47:51 +0100
> Subject: Re: Query Performance SQL Server vs. Postgresql
> To: humairm(at)hotmail(dot)com
> CC: pgsql-performance(at)postgresql(dot)org
> 
> 2010/11/17 Humair Mohammed <humairm(at)hotmail(dot)com>:
> >
> > There are no indexes on the tables either in SQL Server or Postgresql - I am
> > comparing apples to apples here. I ran ANALYZE on the postgresql tables,
> > after that query performance times are still high 42 seconds with COALESCE
> > and 35 seconds with IS DISTINCT FROM.
> > Here is the execution plan from Postgresql for qurey - select pb.id from
> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
> > pg.question and coalesce(pb.response,'MISSING') <>
> > coalesce(pg.response,'MISSING')
> > Execution Time: 42 seconds
> > "Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)"
> > "  Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
> > (pg.question)::text))"
> > "  Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
> > "  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)"
> > "  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134)"
> > "        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
> > width=134)"
> 
> this is little bit strange - did you ANALYZE and VACUUM?
> 
> please send result of EXPLAIN ANALYZE
> 
> Pavel
> 
> >
> > And here is the execution plan from SQL Server for query - select pb.id from
> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
> > pg.question and isnull(pb.response,'ISNULL')<>  isnull(pg.response,'ISNULL')
> > Execution Time: < 1 second
> > Cost: 1%  |--Parallelism(Gather Streams)
> > Cost: 31%       |--Hash Match(Inner Join, HASH:([pb].[ID],
> > [pb].[Question])=([pg].[ID], [pg].[Question]),
> > RESIDUAL:([master].[dbo].[pivotbad].[ID] as
> > [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
> > [master].[dbo].[pivotbad].[Question] as
> > [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND
> > [Expr1006]<>[Expr1007]))
> >     Cost: 0%  |--Bitmap(HASH:([pb].[ID], [pb].[Question]),
> > DEFINE:([Bitmap1008]))
> >             Cost: 0%    |--Compute
> > Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as
> > [pb].[Response],'ISNULL')))
> >             Cost:  6%   |--Parallelism(Repartition Streams, Hash
> > Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
> >             Cost: 12%  |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS
> > [pb]))
> >             Cost: 0% |--Compute
> > Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as
> > [pg].[Response],'ISNULL')))
> >                 Cost: 17% |--Parallelism(Repartition Streams, Hash
> > Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
> >                     Cost: 33% |--Table
> > Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
> > WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
> > [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
> >
> >
> >
> >> From: pavel(dot)stehule(at)gmail(dot)com
> >> Date: Tue, 16 Nov 2010 08:12:03 +0100
> >> Subject: Re: [PERFORM]
> >> To: humairm(at)hotmail(dot)com
> >> CC: pgsql-performance(at)postgresql(dot)org
> >>
> >> 2010/11/15 Humair Mohammed <humairm(at)hotmail(dot)com>:
> >> > I have 2 tables with a 200,000 rows of data 3 character/string columns
> >> > ID,
> >> > Question and Response. The query below compares the data between the 2
> >> > tables based on ID and Question and if the Response does not match
> >> > between
> >> > the left table and the right table it identifies the ID's where there is
> >> > a
> >> > mismatch. Running the query in SQL Server 2008 using the ISNULL function
> >> > take a few milliseconds. Running the same query in Postgresql takes over
> >> > 70
> >> > seconds. The 2 queries are below:
> >> > SQL Server 2008 R2 Query
> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> >> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
> >> > isnull(t2.response,'ISNULL')
> >>
> >> > Postgres 9.1 Query
> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> >> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
> >> > coalesce(t2.response,'ISNULL')
> >> > What gives?
> >>
> >> I think, so must problem can be in ugly predicate
> >> coalesce(t1.response,'ISNULL') <>
> >> > coalesce(t2.response,'ISNULL')
> >>
> >> try use a IS DISTINCT OF operator
> >>
> >> ... AND t1.response IS DISTINCT t2.response
> >>
> >> Regards
> >>
> >> Pavel Stehule
> >>
> >> p.s. don't use a coalesce in WHERE clause if it is possible.
> >
 		 	   		  

In response to

Responses

pgsql-performance by date

Next:From: RichDate: 2010-11-17 21:51:55
Subject: Re: Query Performance SQL Server vs. Postgresql
Previous:From: Jon NelsonDate: 2010-11-17 21:36:44
Subject: Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group