Skip site navigation (1) Skip section navigation (2)

Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3

From: willmington(at)gmx(dot)net (R(dot) Willmington)
To: pgsql-admin(at)postgresql(dot)org
Subject: Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3
Date: 2004-11-03 08:50:57
Message-ID: e6568d.0411030050.494af7ab@posting.google.com (view raw or flat)
Thread:
Lists: pgsql-admin
Ladies and gentlemen,

I have a strange problem with index usage:

We have two identical database layouts in two different databases 
in the same db server:

                 Postgres 7.3.2 Server

           Database 1              Database 2
          |- cms_contents          |- cms_contents
          |- cms_log               |- cms_log
            
Nedless to say the tables have the same indices.

When running the following query, the planner 
decides to use all suitable table indices in database 1, but for the
tables in database 2 it doesn't. This results in the query for
database 1 taking about 1 second, whereas for database 2 the query
takes about 4 minutes (!).

SELECT a.id, b.log_timestamp as timestamp 
       FROM cms_contents a, cms_log b
       WHERE a.state_id = 'd' 
             AND a.parent_id NOT IN (
                SELECT id FROM cms_contents 
                 WHERE state_id = 'd' 
                 AND language='de' and domain_id = 1
                ) 
             AND a.domain_id = 1 
             AND a.language = 'de' 
             AND a.id = b.content_id 
             AND b.log_action = 'deleteContent' 
             AND b.log_timestamp = (
                   SELECT max(log_timestamp) 
                    FROM cms_log 
                    WHERE content_id = a.id 
                    AND b.domain_id = a.domain_id 
                    AND b.language = a.language
                   )
             ORDER BY timestamp


I have already analyzed, reindexed, vacuumed (FULL...) the tables and
both
databases with no results. Indices in database 2 exist and are not
broken, they are used for simple querys such as "select * from cms_log
where log_action ='XYZ'".
I have even restarted the postgres server to exclude any "opened -
curser" based problems.

Here is what i get when doing an expain for the big select query:

_______________________________________

Database 1 (it is obviously using the indices)
_______________________________________


 Sort  (cost=36615.14..36615.14 rows=1 width=36)
   Sort Key: b.log_timestamp
   ->  Nested Loop  (cost=0.00..36615.13 rows=1 width=36)
         Join Filter: ("outer".log_timestamp = (subplan))
         ->  Index Scan using cms_log_log_action on cms_log b  
              (cost=0.00..451.79 rows=120 width=22)
               Index Cond: (log_action = 'deleteContent'::character
varying)
         ->  Index Scan using pk__cms_contents on cms_contents a  
              (cost=0.00..213.55 rows=1 width=14)
               Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1)
               AND (a."language" = 'de'::bpchar))
               Filter: ((state_id = 'd'::bpchar) AND (subplan))
               SubPlan
                 ->  Materialize  (cost=207.58..207.58 rows=281
width=4)
                       ->  Index Scan using cms_contents_state_id on 
                       cms_contents  (cost=0.00..207.58 rows=281
width=4)
                             Index Cond: (state_id = 'd'::bpchar)
                             Filter: (("language" = 'de'::bpchar) AND
(domain_id
                                       = 1))
         SubPlan
           ->  Aggregate  (cost=86.80..86.80 rows=1 width=8)
                 ->  Result  (cost=0.00..86.73 rows=27 width=8)
                       One-Time Filter: (($1 = $2) AND ($3 = $4))
                       ->  Index Scan using cms_log_content_id on
cms_log
                            (cost=0.00..86.73 rows=27 width=8)
                             Index Cond: (content_id = $0)



_______________________________________

Database 2 (it is obviously not using all indices)
_______________________________________

 Sort  (cost=337657.43..337657.43 rows=1 width=36)
   Sort Key: b.log_timestamp
   ->  Nested Loop  (cost=0.00..337657.42 rows=1 width=36)
         Join Filter: ("outer".log_timestamp = (subplan))
         ->  Seq Scan on cms_log b  (cost=0.00..2492.53 rows=1491
width=22)
               Filter: (log_action = 'deleteContent'::character
varying)
         ->  Index Scan using pk__cms_contents on cms_contents a 
(cost=0.00..172.78 rows=1 width=14)
               Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
               Filter: ((state_id = 'd'::bpchar) AND (subplan))
               SubPlan
                 ->  Seq Scan on cms_contents  (cost=0.00..333.94
rows=2347 width=4)
                       Filter: ((state_id = 'd'::bpchar) AND
("language" = 'de'::bpchar) AND (domain_id = 1))
         SubPlan
           ->  Aggregate  (cost=51.95..51.95 rows=1 width=8)
                 ->  Result  (cost=0.00..51.89 rows=24 width=8)
                       One-Time Filter: (($1 = $2) AND ($3 = $4))
                       ->  Index Scan using cms_log_content_id on
cms_log  (cost=0.00..51.89 rows=24 width=8)
                             Index Cond: (content_id = $0)


_____________________________________________________________

It seems to me that the query planner has some sort of bug when
predicting the costs for the query.

If i set SET ENABLE_SEQSCAN = NO; before running the query in database
2,
this is what i get from the planner:

______________________________________________________________

 Sort  (cost=826278.50..826278.51 rows=1 width=36)
   Sort Key: b.log_timestamp
   ->  Nested Loop  (cost=0.00..826278.49 rows=1 width=36)
         Join Filter: ("outer".log_timestamp = (subplan))
         ->  Index Scan using cms_log_log_action on cms_log b 
(cost=0.00..3539.29 rows=1491 width=22)
               Index Cond: (log_action = 'deleteContent'::character
varying)
         ->  Index Scan using pk__cms_contents on cms_contents a 
(cost=0.00..499.72 rows=1 width=14)
               Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
               Filter: ((state_id = 'd'::bpchar) AND (subplan))
               SubPlan
                 ->  Materialize  (cost=493.92..493.92 rows=2347
width=4)
                       ->  Index Scan using cms_contents_state_id on
cms_contents  (cost=0.00..493.92 rows=2347 width=4)
                             Index Cond: (state_id = 'd'::bpchar)
                             Filter: (("language" = 'de'::bpchar) AND
(domain_id = 1))
         SubPlan
           ->  Aggregate  (cost=51.95..51.95 rows=1 width=8)
                 ->  Result  (cost=0.00..51.89 rows=24 width=8)
                       One-Time Filter: (($1 = $2) AND ($3 = $4))
                       ->  Index Scan using cms_log_content_id on
cms_log  (cost=0.00..51.89 rows=24 width=8)
                             Index Cond: (content_id = $0)


______________________________________________________

So the planner predicts that the costs for the select with index usage
are about 3 times higher than without index usage, whereas in fact it
is a lot faster.

How can i improve the predicion? Am i to change the query?


Here are the table layouts:

______________________________________________

cms_contents:
______________________________________________


        Column         |          Type          |           Modifiers
-----------------------+------------------------+-------------------------------
 id                    | integer                | not null
 domain_id             | integer                | not null
 language              | character(2)           | not null
 state_id              | character(1)           | not null
 type_name             | character varying(50)  | not null
 parent_id             | integer                | not null
 sort_children_by      | character varying(100) | not null default
'order_hint'
 show_children_in_tree | bit(1)                 | not null default
B'1'::"bit"
 is_online             | bit(1)                 | 

Indexes: pk__cms_contents primary key btree (id, domain_id,
"language"),
         cms_contents_children_index btree (domain_id, parent_id,
"language", state_id, is_online),
         cms_contents_state_id btree (state_id)



______________________________________________

cms_log:
______________________________________________


    Column     |            Type             |       Modifiers        
---------------+-----------------------------+------------------------
 content_id    | integer                     | not null
 domain_id     | integer                     | not null
 language      | character(2)                | not null
 log_timestamp | timestamp without time zone | not null default now()
 log_user      | character varying(50)       | not null
 log_action    | character varying(50)       | not null
Indexes: cms_log_content_id btree (content_id),
         cms_log_content_id_domain_action_language btree (content_id,
domain_id, "language", log_action),
         cms_log_log_action btree (log_action),
         cms_log_log_timestamp btree (log_timestamp)



Here are the table statistics:

database 1:
  cms_contents: 13109 entries
  cms_log: 119166 entries

database 2:
  cms_contents: 10436 entries
  cms_log: 105922 entries


Any advice appreciated,

Kind Regards,
R. Willmington

pgsql-admin by date

Next:From: Allen SmithDate: 2004-11-03 19:39:23
Subject: Help need to restore dropped db
Previous:From: Werner BohlDate: 2004-11-02 19:23:30
Subject: Postgresql beta4 rpms

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group