Skip site navigation (1) Skip section navigation (2)

Should the optimizer see this?

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: rm_pg(at)cheapcomplexdevices(dot)com
Subject: Should the optimizer see this?
Date: 2005-01-27 01:27:59
Message-ID: Pine.LNX.4.58.0501261636010.9564@greenie.cheapcomplexdevices.com (view raw or flat)
Thread:
Lists: pgsql-performance
Short summary... the second query runs faster, and I think
they should be identical queries.   Should the optimizer
have found this optimization?


I have two identical (or so I believe) queries; one where I 
explicitly add a "is not null" comparison; and one where I 
think it would implicitly only find not-null columns.

The queries are 

    select *
      from rt4, rt5
     where rt4.tigerfile = rt5.tigerfile
       and feat = feat3;

and 

    select *
      from (select * from rt4 where feat3 is not null) as rt4, rt5
     where rt4.tigerfile = rt5.tigerfile
       and feat = feat3;

I would have thought that the optimizer would see that
if feat3 is null (which it usually is), it doesn't need
to keep those rows and sort them -- but it seems (looking
both at explain analyze and "du" on the tmp directory)
that in the first query it is indeed sorting all the
rows --- even the ones with feat3=null.

 

The tables are the Census Tiger Line data explained in detail here:
    http://www.census.gov/geo/www/tiger/tiger2003/TGR2003.pdf
I can attach the create statemnts for the tables if people 
think they'd help.   Basically, table rt4 has a column
called feat3 which is usually null, and table rt5 has a
column called feat which is never null.  Both tables have
a few million rows.

No indexes were used, since I'm joining everything to 
everything, they shouldn't have helped anyway.  However
vacuum analyze was run, and (as seen in the second query)
the stats did know that the column feat3 was mostly null.

=====================================================================================================
fli=# 
fli=# explain analyze                                                                                                                       
    select *                                                                                                                                
      from rt4, rt5                                                                                                                         
     where rt4.tigerfile = rt5.tigerfile                                                                                                    
       and feat = feat3;
fli-# fli-# fli-# fli-# 

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1922903.02..1967385.35 rows=117698 width=100) (actual time=179246.872..218920.724 rows=153091 loops=1)
   Merge Cond: (("outer".feat3 = "inner".feat) AND ("outer".tigerfile = "inner".tigerfile))
   ->  Sort  (cost=876532.10..888964.80 rows=4973079 width=45) (actual time=57213.327..67313.216 rows=4971022 loops=1)
         Sort Key: rt4.feat3, rt4.tigerfile
         ->  Seq Scan on rt4  (cost=0.00..94198.79 rows=4973079 width=45) (actual time=0.053..10433.883 rows=4971022 loops=1)
   ->  Sort  (cost=1046370.92..1060457.95 rows=5634813 width=55) (actual time=122033.463..134037.127 rows=5767675 loops=1)
         Sort Key: rt5.feat, rt5.tigerfile
         ->  Seq Scan on rt5  (cost=0.00..127146.13 rows=5634813 width=55) (actual time=0.016..22538.958 rows=5635077 loops=1)
 Total runtime: 219632.580 ms
(9 rows)

fli=# fli=# fli=# 
fli=# explain analyze                                                                                                                       
    select *                                                                                                                                
      from (select * from rt4 where feat3 is not null) as rt4, rt5                                                                          
     where rt4.tigerfile = rt5.tigerfile                                                                                                    
       and feat = feat3;                                                                                                                    

fli-# fli-# fli-# fli-#                                                           QUERY PLAN             
-------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1152466.47..1194789.77 rows=3296 width=100) (actual time=125982.562..145927.220 rows=153091 loops=1)
   Merge Cond: (("outer".feat3 = "inner".feat) AND ("outer".tigerfile = "inner".tigerfile))
   ->  Sort  (cost=106095.56..106443.67 rows=139247 width=45) (actual time=11729.319..11823.006 rows=153091 loops=1)
         Sort Key: tgr.rt4.feat3, tgr.rt4.tigerfile
         ->  Seq Scan on rt4  (cost=0.00..94198.79 rows=139247 width=45) (actual time=32.404..10893.373 rows=153091 loops=1)
               Filter: (feat3 IS NOT NULL)
   ->  Sort  (cost=1046370.92..1060457.95 rows=5634813 width=55) (actual time=114253.157..126650.225 rows=5767675 loops=1)
         Sort Key: rt5.feat, rt5.tigerfile
         ->  Seq Scan on rt5  (cost=0.00..127146.13 rows=5634813 width=55) (actual time=0.012..19253.431 rows=5635077 loops=1)
 Total runtime: 146480.294 ms
(10 rows)

fli=# fli=# 
fli=# 


Responses

pgsql-performance by date

Next:From: Ken EgervariDate: 2005-01-27 02:17:23
Subject: Performance problem with semi-large tables
Previous:From: Steve PoeDate: 2005-01-26 23:49:54
Subject: Re: Ideal disk setup for Postgresql 7.4?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group