Skip site navigation (1) Skip section navigation (2)

Re: multi-layered view join performance oddities

From: Svenne Krap <svenne(at)krap(dot)dk>
To: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>
Cc: PgSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: multi-layered view join performance oddities
Date: 2005-10-30 18:33:03
Message-ID: 436511DF.5020800@krap.dk (view raw or flat)
Thread:
Lists: pgsql-performance
Hi.

Your suggestion with disableing the nested loop really worked well:

rkr=# set enable_nestloop=false;
SET
rkr=# explain analyze select * from ord_result_pct_pretty ;
                                                                        
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=230.06..337.49 rows=1 width=174) (actual 
time=21.893..42.356 rows=2250 loops=1)
   Hash Cond: (("outer".dataset_id = "inner".dataset_id) AND 
("outer".nb_property_type_id = "inner".nb_property_type_id))
   ->  Hash Join  (cost=56.94..164.10 rows=26 width=93) (actual 
time=5.073..17.906 rows=2532 loops=1)
         Hash Cond: ("outer".dataset_id = "inner".id)
         ->  Hash Join  (cost=55.54..161.63 rows=161 width=57) (actual 
time=4.996..14.775 rows=2532 loops=1)
               Hash Cond: ("outer".institut = "inner".id)
               ->  Append  (cost=54.38..121.72 rows=2476 width=44) 
(actual time=4.964..11.827 rows=2532 loops=1)
                     ->  HashAggregate  (cost=54.38..57.20 rows=226 
width=16) (actual time=4.964..5.174 rows=282 loops=1)
                           ->  Seq Scan on ord_entrydata_current  
(cost=0.00..37.50 rows=2250 width=16) (actual time=0.002..1.305 
rows=2250 loops=1)
                     ->  Subquery Scan "*SELECT* 2"  (cost=0.00..60.00 
rows=2250 width=20) (actual time=0.009..4.948 rows=2250 loops=1)
                           ->  Seq Scan on ord_entrydata_current  
(cost=0.00..37.50 rows=2250 width=20) (actual time=0.003..2.098 
rows=2250 loops=1)
               ->  Hash  (cost=1.13..1.13 rows=13 width=17) (actual 
time=0.022..0.022 rows=13 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.13 rows=13 
width=17) (actual time=0.003..0.013 rows=13 loops=1)
         ->  Hash  (cost=1.32..1.32 rows=32 width=36) (actual 
time=0.070..0.070 rows=32 loops=1)
               ->  Seq Scan on ord_dataset od  (cost=0.00..1.32 rows=32 
width=36) (actual time=0.009..0.043 rows=32 loops=1)
                     Filter: is_visible
   ->  Hash  (cost=173.07..173.07 rows=10 width=97) (actual 
time=15.472..15.472 rows=256 loops=1)
         ->  Hash Join  (cost=166.15..173.07 rows=10 width=97) (actual 
time=14.666..15.203 rows=256 loops=1)
               Hash Cond: ("outer".nb_property_type_id = "inner".id)
               ->  HashAggregate  (cost=165.05..168.15 rows=248 
width=40) (actual time=14.619..14.849 rows=288 loops=1)
                     ->  Append  (cost=54.38..121.72 rows=2476 width=44) 
(actual time=5.012..11.130 rows=2532 loops=1)
                           ->  HashAggregate  (cost=54.38..57.20 
rows=226 width=16) (actual time=5.011..5.222 rows=282 loops=1)
                                 ->  Seq Scan on ord_entrydata_current  
(cost=0.00..37.50 rows=2250 width=16) (actual time=0.001..1.261 
rows=2250 loops=1)
                           ->  Subquery Scan "*SELECT* 2"  
(cost=0.00..60.00 rows=2250 width=20) (actual time=0.010..4.308 
rows=2250 loops=1)
                                 ->  Seq Scan on ord_entrydata_current  
(cost=0.00..37.50 rows=2250 width=20) (actual time=0.002..1.694 
rows=2250 loops=1)
               ->  Hash  (cost=1.08..1.08 rows=8 width=57) (actual 
time=0.026..0.026 rows=8 loops=1)
                     ->  Seq Scan on nb_property_type npt  
(cost=0.00..1.08 rows=8 width=57) (actual time=0.004..0.019 rows=8 loops=1)
 Total runtime: 43.297 ms
(28 rows)

Now, the whole question becomes, how do I get the planner to make a 
better estimation of the returned rows.

I am not sure, I can follow your moving-the-union-all-further-out 
advice, as I see no different place for the unioning of the two datasets.

Maybe one of the core devs know, where to fiddle :)

Svenne

Steinar H. Gunderson wrote:

>On Sun, Oct 30, 2005 at 06:16:04PM +0100, Svenne Krap wrote:
>  
>
>> Nested Loop  (cost=223.09..338.61 rows=1 width=174) (actual time=20.213..721.361 rows=2250 loops=1)
>>   Join Filter: (("outer".dataset_id = "inner".dataset_id) AND ("outer".nb_property_type_id = "inner".nb_property_type_id))
>>   ->  Hash Join  (cost=58.04..164.26 rows=1 width=150) (actual time=5.510..22.088 rows=2250 loops=1)
>>    
>>
>
>There's horrible misestimation here. It expects one row and thus starts a
>nested loop, but gets 2250. No wonder it's slow :-)
>
>The misestimation can be traced all the way down here:
>
>  
>
>>         Hash Cond: ("outer".institut = "inner".id)
>>         ->  Hash Join  (cost=56.88..163.00 rows=16 width=137) (actual time=5.473..19.165 rows=2250 loops=1)
>>               Hash Cond: ("outer".dataset_id = "inner".id)
>>               ->  Hash Join  (cost=55.48..160.95 rows=99 width=101) (actual time=5.412..16.264 rows=2250 loops=1)
>>    
>>
>
>where the planner misestimates the selectivity of your join (it estimates 99
>rows, and there are 2250).
>
>I've had problems joining with Append nodes in the past, and solved the
>problem by moving the UNION ALL a bit out, but I'm not sure if it's a very
>good general solution, or a solution to your problems here.
>
>If all else fails, you could "set enable_nestloop=false", but that is not a
>good idea in the long run, I'd guess -- it's much better to make sure the
>planner has good estimates and let it do the correct decisions from there.
>
>/* Steinar */
>  
>

In response to

pgsql-performance by date

Next:From: Svenne KrapDate: 2005-10-30 18:49:10
Subject: Re: multi-layered view join performance oddities
Previous:From: Tom LaneDate: 2005-10-30 18:27:01
Subject: Re: multi-layered view join performance oddities

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group