Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3

From: willmington(at)gmx(dot)net (R(dot) Willmington)
To: pgsql-admin(at)postgresql(dot)org
Subject: Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3
Date: 2004-11-03 08:50:57
Message-ID: e6568d.0411030050.494af7ab@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Ladies and gentlemen,

I have a strange problem with index usage:

We have two identical database layouts in two different databases
in the same db server:

Postgres 7.3.2 Server

Database 1 Database 2
|- cms_contents |- cms_contents
|- cms_log |- cms_log

Nedless to say the tables have the same indices.

When running the following query, the planner
decides to use all suitable table indices in database 1, but for the
tables in database 2 it doesn't. This results in the query for
database 1 taking about 1 second, whereas for database 2 the query
takes about 4 minutes (!).

SELECT a.id, b.log_timestamp as timestamp
FROM cms_contents a, cms_log b
WHERE a.state_id = 'd'
AND a.parent_id NOT IN (
SELECT id FROM cms_contents
WHERE state_id = 'd'
AND language='de' and domain_id = 1
)
AND a.domain_id = 1
AND a.language = 'de'
AND a.id = b.content_id
AND b.log_action = 'deleteContent'
AND b.log_timestamp = (
SELECT max(log_timestamp)
FROM cms_log
WHERE content_id = a.id
AND b.domain_id = a.domain_id
AND b.language = a.language
)
ORDER BY timestamp

I have already analyzed, reindexed, vacuumed (FULL...) the tables and
both
databases with no results. Indices in database 2 exist and are not
broken, they are used for simple querys such as "select * from cms_log
where log_action ='XYZ'".
I have even restarted the postgres server to exclude any "opened -
curser" based problems.

Here is what i get when doing an expain for the big select query:

_______________________________________

Database 1 (it is obviously using the indices)
_______________________________________

Sort (cost=36615.14..36615.14 rows=1 width=36)
Sort Key: b.log_timestamp
-> Nested Loop (cost=0.00..36615.13 rows=1 width=36)
Join Filter: ("outer".log_timestamp = (subplan))
-> Index Scan using cms_log_log_action on cms_log b
(cost=0.00..451.79 rows=120 width=22)
Index Cond: (log_action = 'deleteContent'::character
varying)
-> Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..213.55 rows=1 width=14)
Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1)
AND (a."language" = 'de'::bpchar))
Filter: ((state_id = 'd'::bpchar) AND (subplan))
SubPlan
-> Materialize (cost=207.58..207.58 rows=281
width=4)
-> Index Scan using cms_contents_state_id on
cms_contents (cost=0.00..207.58 rows=281
width=4)
Index Cond: (state_id = 'd'::bpchar)
Filter: (("language" = 'de'::bpchar) AND
(domain_id
= 1))
SubPlan
-> Aggregate (cost=86.80..86.80 rows=1 width=8)
-> Result (cost=0.00..86.73 rows=27 width=8)
One-Time Filter: (($1 = $2) AND ($3 = $4))
-> Index Scan using cms_log_content_id on
cms_log
(cost=0.00..86.73 rows=27 width=8)
Index Cond: (content_id = $0)

_______________________________________

Database 2 (it is obviously not using all indices)
_______________________________________

Sort (cost=337657.43..337657.43 rows=1 width=36)
Sort Key: b.log_timestamp
-> Nested Loop (cost=0.00..337657.42 rows=1 width=36)
Join Filter: ("outer".log_timestamp = (subplan))
-> Seq Scan on cms_log b (cost=0.00..2492.53 rows=1491
width=22)
Filter: (log_action = 'deleteContent'::character
varying)
-> Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..172.78 rows=1 width=14)
Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
Filter: ((state_id = 'd'::bpchar) AND (subplan))
SubPlan
-> Seq Scan on cms_contents (cost=0.00..333.94
rows=2347 width=4)
Filter: ((state_id = 'd'::bpchar) AND
("language" = 'de'::bpchar) AND (domain_id = 1))
SubPlan
-> Aggregate (cost=51.95..51.95 rows=1 width=8)
-> Result (cost=0.00..51.89 rows=24 width=8)
One-Time Filter: (($1 = $2) AND ($3 = $4))
-> Index Scan using cms_log_content_id on
cms_log (cost=0.00..51.89 rows=24 width=8)
Index Cond: (content_id = $0)

_____________________________________________________________

It seems to me that the query planner has some sort of bug when
predicting the costs for the query.

If i set SET ENABLE_SEQSCAN = NO; before running the query in database
2,
this is what i get from the planner:

______________________________________________________________

Sort (cost=826278.50..826278.51 rows=1 width=36)
Sort Key: b.log_timestamp
-> Nested Loop (cost=0.00..826278.49 rows=1 width=36)
Join Filter: ("outer".log_timestamp = (subplan))
-> Index Scan using cms_log_log_action on cms_log b
(cost=0.00..3539.29 rows=1491 width=22)
Index Cond: (log_action = 'deleteContent'::character
varying)
-> Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..499.72 rows=1 width=14)
Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
Filter: ((state_id = 'd'::bpchar) AND (subplan))
SubPlan
-> Materialize (cost=493.92..493.92 rows=2347
width=4)
-> Index Scan using cms_contents_state_id on
cms_contents (cost=0.00..493.92 rows=2347 width=4)
Index Cond: (state_id = 'd'::bpchar)
Filter: (("language" = 'de'::bpchar) AND
(domain_id = 1))
SubPlan
-> Aggregate (cost=51.95..51.95 rows=1 width=8)
-> Result (cost=0.00..51.89 rows=24 width=8)
One-Time Filter: (($1 = $2) AND ($3 = $4))
-> Index Scan using cms_log_content_id on
cms_log (cost=0.00..51.89 rows=24 width=8)
Index Cond: (content_id = $0)

______________________________________________________

So the planner predicts that the costs for the select with index usage
are about 3 times higher than without index usage, whereas in fact it
is a lot faster.

How can i improve the predicion? Am i to change the query?

Here are the table layouts:

______________________________________________

cms_contents:
______________________________________________

Column | Type | Modifiers
-----------------------+------------------------+-------------------------------
id | integer | not null
domain_id | integer | not null
language | character(2) | not null
state_id | character(1) | not null
type_name | character varying(50) | not null
parent_id | integer | not null
sort_children_by | character varying(100) | not null default
'order_hint'
show_children_in_tree | bit(1) | not null default
B'1'::"bit"
is_online | bit(1) |

Indexes: pk__cms_contents primary key btree (id, domain_id,
"language"),
cms_contents_children_index btree (domain_id, parent_id,
"language", state_id, is_online),
cms_contents_state_id btree (state_id)

______________________________________________

cms_log:
______________________________________________

Column | Type | Modifiers
---------------+-----------------------------+------------------------
content_id | integer | not null
domain_id | integer | not null
language | character(2) | not null
log_timestamp | timestamp without time zone | not null default now()
log_user | character varying(50) | not null
log_action | character varying(50) | not null
Indexes: cms_log_content_id btree (content_id),
cms_log_content_id_domain_action_language btree (content_id,
domain_id, "language", log_action),
cms_log_log_action btree (log_action),
cms_log_log_timestamp btree (log_timestamp)

Here are the table statistics:

database 1:
cms_contents: 13109 entries
cms_log: 119166 entries

database 2:
cms_contents: 10436 entries
cms_log: 105922 entries

Any advice appreciated,

Kind Regards,
R. Willmington

Browse pgsql-admin by date

  From Date Subject
Next Message Allen Smith 2004-11-03 19:39:23 Help need to restore dropped db
Previous Message Werner Bohl 2004-11-02 19:23:30 Postgresql beta4 rpms