Skip site navigation (1) Skip section navigation (2)

Planner question - wrong row count estimation

From: Michael Korbakov <m(dot)korbakov(at)postindustria(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner question - wrong row count estimation
Date: 2009-09-18 14:21:29
Message-ID: 724D5855-3F9E-4B6A-AB74-BAE06218DC5D@postindustria.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi everybody.

I'm having an issues with wrong plan for query in PostgreSQL (version  
8.3). EXPLAIN ANALYZE shows that there're a lot of places where  
planner estimates row count totally wrong, like 1 instead of 12000+.
default_statistics_target variable is set to 100, and I tried to run  
VACUUM ANALYZE many times.

Because of wrong estimation query planner uses nested loops instead of  
hash joins and it results in very bad performance. Disabling nested  
loops helps, but I want to understand what happens there and try to  
avoid it in future.

Could you help me with it? Query and plan are below.

Thank you in advance,

Michael Korbakov

SELECT wide_stats.*, revenue * rev_share AS net_revenue, revenue *  
(rev_share - partner_rev_share) AS gross_revenue,
            (CASE clicks WHEN 0 THEN 0 ELSE revenue * rev_share /  
clicks END) as net_rpc, (CASE clicks WHEN 0 THEN 0 ELSE revenue *  
(rev_share - partner_rev_share) / clicks END) AS gross_rpc,
            rev_share * wide_stats.ecpm AS net_ecpm, (rev_share -  
partner_rev_share) * wide_stats.ecpm AS gross_ecpm,
            partner_rev_share * revenue AS partner_revenue
     FROM

     (SELECT stats.id, stats.date, stats.domain_id, (CASE WHEN  
stats.partner_id = 1 OR top_subparent = 1 THEN title ELSE real_title  
END) AS title,
             stats.pageviews, stats.subsequent_searches,  
stats.searches, stats.clicks, stats.revenue, stats.country,  
stats.approved,
             stats.partner_id, shares.rev_share, (CASE top_subparent  
WHEN 1 THEN 0 ELSE partners_shares.rev_share END) AS  
partner_rev_share, stats.ctr, stats.rpc,
             (CASE stats.searches WHEN 0 THEN 0 ELSE 1000 * revenue /  
searches END) AS ecpm,
             (SELECT name FROM partners WHERE id = top_subparent) AS  
owner,
             subparents.top_subparent
     FROM reports.daily_domain_reports AS stats
     LEFT JOIN materialized_top_subparents AS subparents ON  
stats.partner_id = subparents.partner_id AND subparents.parent_id = 1
     LEFT JOIN reports.monthly_shares_with_parents_materialized AS  
shares ON date_part('year'::text, stats.date) = shares.year AND  
date_part('month'::text, stats.date) = shares.month AND  
shares.partner_id = 1
     LEFT JOIN reports.monthly_shares_with_parents_materialized AS  
partners_shares ON date_part('year'::text, stats.date) =  
partners_shares.year AND date_part('month'::text, stats.date) =  
partners_shares.month AND partners_shares.partner_id = top_subparent
     WHERE stats.partner_id = 1 OR top_subparent IN (SELECT  
partners.id FROM partners WHERE parent_id = 1)

) AS wide_stats WHERE date >= '2009-08-01' AND date < '2009-09-02';


Nested Loop  (cost=11.80..172.48 rows=1 width=94) (actual  
time=93.792..14485.092 rows=12745 loops=1)
   ->  Nested Loop  (cost=11.80..168.94 rows=1 width=56) (actual  
time=93.739..13342.157 rows=12745 loops=1)
         ->  Nested Loop  (cost=11.80..168.62 rows=1 width=60) (actual  
time=93.734..13227.265 rows=12745 loops=1)
               Join Filter: (COALESCE((domain_stats.date <=  
domain_mappings.end_date), true) AND ((shares.year)::double precision  
= date_part('year'::text, (domain_stats.date)::timestamp without time  
zone)) AND ((shares.month)::double precision = date_part 
('month'::text, (domain_stats.date)::timestamp without time zone)))
               ->  Nested Loop  (cost=11.80..31.74 rows=1 width=48)  
(actual time=0.258..26.950 rows=6069 loops=1)
                     Join Filter: ((domain_mappings.partner_id = 1) OR  
(hashed subplan))
                     ->  Nested Loop  (cost=8.50..27.28 rows=1  
width=32) (actual time=0.114..9.298 rows=567 loops=1)
                           ->  Hash Join  (cost=8.50..25.11 rows=1  
width=28) (actual time=0.092..1.864 rows=560 loops=1)
                                 Hash Cond:  
(((partners_shares.year)::double precision = (shares.year)::double  
precision) AND ((partners_shares.month)::double precision =  
(shares.month)::double precision))
                                 ->  Seq Scan on  
monthly_shares_with_parents_materialized partners_shares   
(cost=0.00..9.60 rows=560 width=16) (actual time=0.009..0.336 rows=560  
loops=1)
                                 ->  Hash  (cost=8.39..8.39 rows=7  
width=12) (actual time=0.059..0.059 rows=7 loops=1)
                                       ->  Bitmap Heap Scan on  
monthly_shares_with_parents_materialized shares  (cost=4.30..8.39  
rows=7 width=12) (actual time=0.033..0.041 rows=7 loops=1)
                                             Recheck Cond: (partner_id  
= 1)
                                             ->  Bitmap Index Scan on  
monthly_shares_with_parents_materialized_pkey  (cost=0.00..4.30 rows=7  
width=0) (actual time=0.027..0.027 rows=7 loops=1)
                                                   Index Cond:  
(partner_id = 1)
                           ->  Index Scan using  
materialized_top_subparents_pkey on materialized_top_subparents  
subparents  (cost=0.00..2.16 rows=1 width=8) (actual time=0.010..0.011  
rows=1 loops=560)
                                 Index Cond: ((subparents.parent_id =  
1) AND (subparents.top_subparent = partners_shares.partner_id))
                     ->  Index Scan using  
ix_domain_mappings_partner_id on domain_mappings  (cost=0.00..0.97  
rows=11 width=16) (actual time=0.004..0.012 rows=11 loops=567)
                           Index Cond: (domain_mappings.partner_id =  
subparents.partner_id)
                     SubPlan
                       ->  Seq Scan on partners  (cost=0.00..3.12  
rows=71 width=4) (actual time=0.005..0.059 rows=71 loops=1)
                             Filter: (parent_id = 1)
               ->  Index Scan using uix_date_domain_country on  
domain_stats  (cost=0.00..136.65 rows=6 width=36) (actual  
time=0.653..2.089 rows=15 loops=6069)
                     Index Cond: ((domain_stats.date >=  
'2009-08-01'::date) AND (domain_stats.date < '2009-09-02'::date) AND  
(domain_stats.date >= domain_mappings.start_date) AND  
(domain_stats.domain_id = domain_mappings.domain_id))
         ->  Index Scan using partners_pkey on partners   
(cost=0.00..0.31 rows=1 width=4) (actual time=0.006..0.007 rows=1  
loops=12745)
               Index Cond: (public.partners.id =  
domain_mappings.partner_id)
   ->  Index Scan using domains_pkey on domains  (cost=0.00..0.29  
rows=1 width=46) (actual time=0.007..0.008 rows=1 loops=12745)
         Index Cond: (domains.id = domain_stats.domain_id)
   SubPlan
     ->  Seq Scan on partners  (cost=0.00..3.12 rows=1 width=3)  
(actual time=0.044..0.064 rows=1 loops=12745)
           Filter: (id = $0)
Total runtime: 14491.142 ms

pgsql-performance by date

Next:From: Tom LaneDate: 2009-09-18 15:42:31
Subject: Re: Different query plans for the same query
Previous:From: Robert HaasDate: 2009-09-18 14:01:19
Subject: Re: Use of BETWEEN with identical values

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group