Poor planner estimation with partitioned tables

From: "Mason Hale" <masonhale(at)gmail(dot)com>
To: "pgsql-general General" <pgsql-general(at)postgresql(dot)org>
Subject: Poor planner estimation with partitioned tables
Date: 2008-08-28 16:23:28
Message-ID: 8bca3aa10808280923v7868fa79u1c985e65238d7a08@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We have a table "entry" that is partitioned into two sub-tables named
"entry_part_new" and "entry_part_old", as described here:
http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html

We find that in some cases, when joining to a partitioned table, the planner
makes wilding inaccurate estimates of the number of rows that will be
returned from the partitioned table.

This is on PostgreSql 8.3.3 with recently analyzed tables. The
sql_inheritance and constraint_exclusion settings are on, and both
sub-tables have non-overlapping check constraints on the primary key column,
id.

Here is an example query and the explain analyze plan. The thing to note
here is that the planner is estimating 5.9 millions rows in the outer nested
loop, when in fact only 16 rows will be affected.

prod_2=> explain analyze
SELECT e.*
FROM publication_entry_pin as pep
INNER JOIN entry e
ON (pep.entry_id = e.id)
WHERE pep.publication_id = 850
prod_2-> \g

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.34..440.53 rows=5949391 width=1276) (actual
time=0.116..0.438 rows=16 loops=1)
Join Filter: (pep.entry_id = e.id)
-> Bitmap Heap Scan on publication_entry_pin pep (cost=3.34..19.45
rows=11 width=4) (actual time=0.065..0.133 rows=16 loops=1)
Recheck Cond: (publication_id = 850)
-> Bitmap Index Scan on
index_publication_entry_pin_on_publication_id_and_place_index
(cost=0.00..3.33 rows=11 width=0) (actual time=0.052..0.052 rows=237
loops=1)
Index Cond: (publication_id = 850)
-> Append (cost=0.00..38.24 rows=3 width=777) (actual time=0.015..0.016
rows=1 loops=16)
-> Index Scan using entry_pkey1 on entry e (cost=0.00..3.27
rows=1 width=777) (actual time=0.001..0.001 rows=0 loops=16)
Index Cond: (e.id = pep.entry_id)
-> Index Scan using entry_pkey on entry_part_old e
(cost=0.00..20.09 rows=1 width=725) (actual time=0.004..0.004 rows=0
loops=16)
Index Cond: (e.id = pep.entry_id)
-> Index Scan using entry_part_new_pkey on entry_part_new e
(cost=0.00..14.88 rows=1 width=742) (actual time=0.009..0.010 rows=1
loops=16)
Index Cond: (e.id = pep.entry_id)
Total runtime: 0.519 ms
(14 rows)

If I modify the query to directly query the entry_part_new partition table,
the plan is much more accurate, estimating 11 rows:

prod_2=> \p
explain analyze
SELECT e.*
FROM publication_entry_pin as pep
INNER JOIN entry_part_new e
ON (pep.entry_id = e.id)
WHERE pep.publication_id = 850

prod_2=> \g

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.34..183.29 rows=11 width=742) (actual
time=0.102..0.323 rows=16 loops=1)
-> Bitmap Heap Scan on publication_entry_pin pep (cost=3.34..19.45
rows=11 width=4) (actual time=0.078..0.125 rows=16 loops=1)
Recheck Cond: (publication_id = 850)
-> Bitmap Index Scan on
index_publication_entry_pin_on_publication_id_and_place_index
(cost=0.00..3.33 rows=11 width=0) (actual time=0.060..0.060 rows=237
loops=1)
Index Cond: (publication_id = 850)
-> Index Scan using entry_part_new_pkey on entry_part_new e
(cost=0.00..14.88 rows=1 width=742) (actual time=0.009..0.010 rows=1
loops=16)
Index Cond: (e.id = pep.entry_id)
Total runtime: 0.381 ms
(8 rows)

prod_2=>

This mis-estimation at times results in dramatically slower queries,
especially when more joins are involved. Because the estimated number of
rows is so high, the planner more often selects a seq scan plan in more
situations, when it really shouldn't.

So, my questions are:

1.) Is this expected behavior of the partitioning feature? If so, it seems
to be a major limitation.

2.) What can I do if I need to use partitioned tables, to help the planner
make better estimates?
So far, we've been adding application logic to dynamically re-write the
queries to directly query the partitioned sub-tables, in effect,
side-stepping Postgres' built-in partitioning feature altogether.

thanks,
Mason

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Sullivan 2008-08-28 16:27:43 Re: Vaccuum best practice: cronjob or autovaccuum?
Previous Message Joao Ferreira gmail 2008-08-28 16:15:06 Re: Vaccuum best practice: cronjob or autovaccuum?