Planner question - wrong row count estimation

From: Michael Korbakov <m(dot)korbakov(at)postindustria(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner question - wrong row count estimation
Date: 2009-09-18 14:21:29
Message-ID: 724D5855-3F9E-4B6A-AB74-BAE06218DC5D@postindustria.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi everybody.

I'm having an issues with wrong plan for query in PostgreSQL (version
8.3). EXPLAIN ANALYZE shows that there're a lot of places where
planner estimates row count totally wrong, like 1 instead of 12000+.
default_statistics_target variable is set to 100, and I tried to run
VACUUM ANALYZE many times.

Because of wrong estimation query planner uses nested loops instead of
hash joins and it results in very bad performance. Disabling nested
loops helps, but I want to understand what happens there and try to
avoid it in future.

Could you help me with it? Query and plan are below.

Thank you in advance,

Michael Korbakov

SELECT wide_stats.*, revenue * rev_share AS net_revenue, revenue *
(rev_share - partner_rev_share) AS gross_revenue,
(CASE clicks WHEN 0 THEN 0 ELSE revenue * rev_share /
clicks END) as net_rpc, (CASE clicks WHEN 0 THEN 0 ELSE revenue *
(rev_share - partner_rev_share) / clicks END) AS gross_rpc,
rev_share * wide_stats.ecpm AS net_ecpm, (rev_share -
partner_rev_share) * wide_stats.ecpm AS gross_ecpm,
partner_rev_share * revenue AS partner_revenue
FROM

(SELECT stats.id, stats.date, stats.domain_id, (CASE WHEN
stats.partner_id = 1 OR top_subparent = 1 THEN title ELSE real_title
END) AS title,
stats.pageviews, stats.subsequent_searches,
stats.searches, stats.clicks, stats.revenue, stats.country,
stats.approved,
stats.partner_id, shares.rev_share, (CASE top_subparent
WHEN 1 THEN 0 ELSE partners_shares.rev_share END) AS
partner_rev_share, stats.ctr, stats.rpc,
(CASE stats.searches WHEN 0 THEN 0 ELSE 1000 * revenue /
searches END) AS ecpm,
(SELECT name FROM partners WHERE id = top_subparent) AS
owner,
subparents.top_subparent
FROM reports.daily_domain_reports AS stats
LEFT JOIN materialized_top_subparents AS subparents ON
stats.partner_id = subparents.partner_id AND subparents.parent_id = 1
LEFT JOIN reports.monthly_shares_with_parents_materialized AS
shares ON date_part('year'::text, stats.date) = shares.year AND
date_part('month'::text, stats.date) = shares.month AND
shares.partner_id = 1
LEFT JOIN reports.monthly_shares_with_parents_materialized AS
partners_shares ON date_part('year'::text, stats.date) =
partners_shares.year AND date_part('month'::text, stats.date) =
partners_shares.month AND partners_shares.partner_id = top_subparent
WHERE stats.partner_id = 1 OR top_subparent IN (SELECT
partners.id FROM partners WHERE parent_id = 1)

) AS wide_stats WHERE date >= '2009-08-01' AND date < '2009-09-02';

Nested Loop (cost=11.80..172.48 rows=1 width=94) (actual
time=93.792..14485.092 rows=12745 loops=1)
-> Nested Loop (cost=11.80..168.94 rows=1 width=56) (actual
time=93.739..13342.157 rows=12745 loops=1)
-> Nested Loop (cost=11.80..168.62 rows=1 width=60) (actual
time=93.734..13227.265 rows=12745 loops=1)
Join Filter: (COALESCE((domain_stats.date <=
domain_mappings.end_date), true) AND ((shares.year)::double precision
= date_part('year'::text, (domain_stats.date)::timestamp without time
zone)) AND ((shares.month)::double precision = date_part
('month'::text, (domain_stats.date)::timestamp without time zone)))
-> Nested Loop (cost=11.80..31.74 rows=1 width=48)
(actual time=0.258..26.950 rows=6069 loops=1)
Join Filter: ((domain_mappings.partner_id = 1) OR
(hashed subplan))
-> Nested Loop (cost=8.50..27.28 rows=1
width=32) (actual time=0.114..9.298 rows=567 loops=1)
-> Hash Join (cost=8.50..25.11 rows=1
width=28) (actual time=0.092..1.864 rows=560 loops=1)
Hash Cond:
(((partners_shares.year)::double precision = (shares.year)::double
precision) AND ((partners_shares.month)::double precision =
(shares.month)::double precision))
-> Seq Scan on
monthly_shares_with_parents_materialized partners_shares
(cost=0.00..9.60 rows=560 width=16) (actual time=0.009..0.336 rows=560
loops=1)
-> Hash (cost=8.39..8.39 rows=7
width=12) (actual time=0.059..0.059 rows=7 loops=1)
-> Bitmap Heap Scan on
monthly_shares_with_parents_materialized shares (cost=4.30..8.39
rows=7 width=12) (actual time=0.033..0.041 rows=7 loops=1)
Recheck Cond: (partner_id
= 1)
-> Bitmap Index Scan on
monthly_shares_with_parents_materialized_pkey (cost=0.00..4.30 rows=7
width=0) (actual time=0.027..0.027 rows=7 loops=1)
Index Cond:
(partner_id = 1)
-> Index Scan using
materialized_top_subparents_pkey on materialized_top_subparents
subparents (cost=0.00..2.16 rows=1 width=8) (actual time=0.010..0.011
rows=1 loops=560)
Index Cond: ((subparents.parent_id =
1) AND (subparents.top_subparent = partners_shares.partner_id))
-> Index Scan using
ix_domain_mappings_partner_id on domain_mappings (cost=0.00..0.97
rows=11 width=16) (actual time=0.004..0.012 rows=11 loops=567)
Index Cond: (domain_mappings.partner_id =
subparents.partner_id)
SubPlan
-> Seq Scan on partners (cost=0.00..3.12
rows=71 width=4) (actual time=0.005..0.059 rows=71 loops=1)
Filter: (parent_id = 1)
-> Index Scan using uix_date_domain_country on
domain_stats (cost=0.00..136.65 rows=6 width=36) (actual
time=0.653..2.089 rows=15 loops=6069)
Index Cond: ((domain_stats.date >=
'2009-08-01'::date) AND (domain_stats.date < '2009-09-02'::date) AND
(domain_stats.date >= domain_mappings.start_date) AND
(domain_stats.domain_id = domain_mappings.domain_id))
-> Index Scan using partners_pkey on partners
(cost=0.00..0.31 rows=1 width=4) (actual time=0.006..0.007 rows=1
loops=12745)
Index Cond: (public.partners.id =
domain_mappings.partner_id)
-> Index Scan using domains_pkey on domains (cost=0.00..0.29
rows=1 width=46) (actual time=0.007..0.008 rows=1 loops=12745)
Index Cond: (domains.id = domain_stats.domain_id)
SubPlan
-> Seq Scan on partners (cost=0.00..3.12 rows=1 width=3)
(actual time=0.044..0.064 rows=1 loops=12745)
Filter: (id = $0)
Total runtime: 14491.142 ms

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-09-18 15:42:31 Re: Different query plans for the same query
Previous Message Robert Haas 2009-09-18 14:01:19 Re: Use of BETWEEN with identical values