Postgresql 9.0.7 weird planner decision (rows in plan close to reality but plan suboptimal)

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Postgresql 9.0.7 weird planner decision (rows in plan close to reality but plan suboptimal)
Date: 2012-04-06 09:46:48
Message-ID: CAK-MWwQjqPhwES=1FYxTRYPTyAJpx5J+mFVtc+7U23SBUBZ3fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Today on one of databases under my management I found very strange plan for
simple query.
Postgresql 9.0.7 on Linux,
random_page_cost=4
seq_page_cost=1

The query and plan:

db=# EXPLAIN (ANALYZE, COSTS, BUFFERS) select obj_id, obj_commented,p2o_id
FROM blog_post as obj
JOIN person2obj ON p2o_obj_obj_id = obj_id
JOIN person2obj_counters ON p2oc_id = p2o_id
WHERE obj_status_did = 1
AND obj_commented IS NOT NULL
AND obj_commented > now() - '7days'::interval
AND obj_commented > p2o_notified
AND p2o_notify = 't';

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=6546.54..6584.69 rows=3 width=24) (actual
time=86.262..1349.266 rows=1770 loops=1)
Merge Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id)
Buffers: shared hit=1140491
-> Index Scan using pk_person2obj_counters on person2obj_counters
(cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948
rows=1212765 loops=1)
Buffers: shared hit=1108452
-> Sort (cost=6546.42..6546.98 rows=221 width=24) (actual
time=85.877..88.373 rows=7870 loops=1)
Sort Key: person2obj.p2o_id
Sort Method: quicksort Memory: 807kB
Buffers: shared hit=32039
-> Nested Loop (cost=0.00..6537.82 rows=221 width=24) (actual
time=0.097..80.129 rows=7870 loops=1)
Buffers: shared hit=32039
-> Index Scan using i_blog_post_commented_active on
blog_post obj (cost=0.00..225.73 rows=1726 width=16) (actual
time=0.028..17.957 rows=6010 loops=1)
Index Cond: ((obj_commented IS NOT NULL) AND
(obj_commented > (now() - '7 days'::interval)))
Buffers: shared hit=6207
-> Index Scan using i_person2obj_obj_notified_subscribed on
person2obj (cost=0.00..3.64 rows=1 width=24) (actual time=0.006..0.009
rows=1 loops=6010)
Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id)
AND (obj.obj_commented > person2obj.p2o_notified))
Buffers: shared hit=25832
Total runtime: 1349.767 ms

I don't understand why database choose merge join with 1.2M entries table.
person2obj_counters have an index on p2oc_id (it's a primary key field).

Switch to fast inner loop plan could be managed with set random_page_cost=10

db=# set random_page_cost to 10;
SET
db=# EXPLAIN (ANALYZE, COSTS, BUFFERS) select obj_id, obj_commented,p2o_id
FROM blog_post as obj
JOIN person2obj ON p2o_obj_obj_id = obj_id
JOIN person2obj_counters ON p2oc_id = p2o_id
WHERE obj_status_did = 1
AND obj_commented IS NOT NULL
AND obj_commented > now() - '7days'::interval
AND obj_commented > p2o_notified
AND p2o_notify = 't';

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..14810.38 rows=3 width=24) (actual
time=16.910..115.110 rows=1758 loops=1)
Buffers: shared hit=57403
-> Nested Loop (cost=0.00..14616.37 rows=221 width=24) (actual
time=0.088..82.342 rows=7858 loops=1)
Buffers: shared hit=32046
-> Index Scan using i_blog_post_commented_active on blog_post
obj (cost=0.00..273.70 rows=1725 width=16) (actual time=0.029..16.260
rows=6009 loops=1)
Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented
> (now() - '7 days'::interval)))
Buffers: shared hit=6222
-> Index Scan using i_person2obj_obj_notified_subscribed on
person2obj (cost=0.00..8.30 rows=1 width=24) (actual time=0.007..0.010
rows=1 loops=6009)
Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND
(obj.obj_commented > person2obj.p2o_notified))
Buffers: shared hit=25824
-> Index Scan using pk_person2obj_counters on person2obj_counters
(cost=0.00..0.87 rows=1 width=8) (actual time=0.004..0.004 rows=0
loops=7858)
Index Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id)
Buffers: shared hit=25357
Total runtime: 115.465 ms
(14 rows)

Ok... so what I see... the database think it will need join 221 rows from
previous level to the person2obj_counters table.
And somehow Pg manage to calculate that the full index scan over 1.2M
entries of the person2obj_counters table is faster that nested loop probes
over 221 value.

What look very suspicious is that merge full index scan+merge join part
adds only 40 points to the total cost (
Merge Join (cost=6546.54..6584.69 rows=3 width=24) (actual
time=86.262..1349.266 rows=1770 loops=1)
...
-> Index Scan using pk_person2obj_counters on person2obj_counters
(cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948
rows=1212765 loops=1)
...
-> Sort (cost=6546.42..6546.98 rows=221 width=24) (actual
time=85.877..88.373 rows=7870 loops=1)
)... how that could be?

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"

МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2012-04-06 13:07:10 Re: 9.1.3 Standby catchup mode
Previous Message hamann.w 2012-04-06 08:41:26 question about alternate ordering of results