Skip site navigation (1) Skip section navigation (2)

query planning and partitioned tables

From: Colton Smith <smith(at)skio(dot)peachnet(dot)edu>
To: pgsql-admin(at)postgresql(dot)org
Subject: query planning and partitioned tables
Date: 2005-11-24 16:12:19
Message-ID: 4385E663.6000305@skio.peachnet.edu (view raw or flat)
Thread:
Lists: pgsql-admin
Hi:

  I have a 440 million row table that I'm attempting to partition.  The 
table is named 'pressure' and holds pressure data from an undersea 
sensor measuring wave height.
The sensor reports every half second.

  The partitions are named 'p0', 'p1' ... 'pN' and are divvied according 
to measurement date.

  See the following:

                                            Table "public.pressure"
      Column      |            Type             
|                           Modifiers                          
------------------+-----------------------------+---------------------------------------------------------------
 pressure_id      | integer                     | not null default 
nextval(('pressure_id_seq'::text)::regclass)
 row_entry_date   | timestamp with time zone    |
 sensor_id        | integer                     | not null
 measurement_date | timestamp without time zone |
 pressure         | double precision            |
Indexes:
    "pressure_pkey" PRIMARY KEY, btree (pressure_id)
    "pressure_measurement_date_index" btree (measurement_date)
Foreign-key constraints:
    "sensor_id_exists" FOREIGN KEY (sensor_id) REFERENCES 
sensor(sensor_id) ON DELETE RESTRICT
Tablespace: "diskvol2"

                                               Table "public.p0"
      Column      |            Type             
|                           Modifiers                          
------------------+-----------------------------+---------------------------------------------------------------
 pressure_id      | integer                     | not null default 
nextval(('pressure_id_seq'::text)::regclass)
 row_entry_date   | timestamp with time zone    |
 sensor_id        | integer                     | not null
 measurement_date | timestamp without time zone |
 pressure         | double precision            |
Indexes:
    "p0_pressure_id_index" UNIQUE, btree (pressure_id), tablespace 
"diskvol2"
    "p0_measurement_date_index" btree (measurement_date), tablespace 
"diskvol2"
Check constraints:
    "p0_measurement_date_check" CHECK (measurement_date >= '1999-08-02 
00:00:00'::timestamp without time zone AND measurement_date < 
'2000-01-06 00:00:00'::timestamp without time zone)
Foreign-key constraints:
    "sensor_id_exists" FOREIGN KEY (sensor_id) REFERENCES 
sensor(sensor_id) ON DELETE RESTRICT
Inherits: pressure
Tablespace: "diskvol2"



My question involves how the database performs the following queries:

explain select max(measurement_date) from pressure;
                                    QUERY 
PLAN                                    
-----------------------------------------------------------------------------------
 Aggregate  (cost=480311.50..480311.51 rows=1 width=8)
   ->  Append  (cost=0.00..425345.20 rows=21986520 width=8)
         ->  Seq Scan on pressure  (cost=0.00..22.30 rows=1230 width=8)
         ->  Seq Scan on p0 pressure  (cost=0.00..425322.90 
rows=21985290 width=8)
(4 rows)

explain select max(measurement_date) from p0;
                                                      QUERY 
PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.02..0.03 rows=1 width=0)
   InitPlan
     ->  Limit  (cost=0.00..0.02 rows=1 width=8)
           ->  Index Scan Backward using p0_measurement_date_index on 
p0  (cost=0.00..531221.19 rows=21985290 width=8)
                 Filter: (measurement_date IS NOT NULL)
(5 rows)

The optimizer doesn't do as well with the former as the latter.  Is that 
the expected behavior or have I screwed things up?

Also, consider the following:

explain select * from pressure where measurement_date < '2000-01-01';
                                                  QUERY 
PLAN                                                 
--------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..480293.25 rows=21092480 width=32)
   ->  Append  (cost=0.00..480293.25 rows=21092480 width=32)
         ->  Index Scan using pressure_measurement_date_index on 
pressure  (cost=0.00..7.13 rows=69 width=32)
               Index Cond: (measurement_date < '2000-01-01 
00:00:00'::timestamp without time zone)
         ->  Seq Scan on p0 pressure  (cost=0.00..480286.12 
rows=21092411 width=32)
               Filter: (measurement_date < '2000-01-01 
00:00:00'::timestamp without time zone)
(6 rows)

explain select * from p0 where measurement_date < '2000-01-01';
                                    QUERY 
PLAN                                    
-----------------------------------------------------------------------------------
 Seq Scan on p0  (cost=0.00..480286.12 rows=21092411 width=32)
   Filter: (measurement_date < '2000-01-01 00:00:00'::timestamp without 
time zone)
(2 rows)

The query planner seems to be ignoring p0's index on measurement_date.

Contrast that to the plan made for a similar query made on a different 
table (unpartitioned) in the same database:

explain select * from wind where measurement_date < '2000-01-01';
                                           QUERY 
PLAN                                          
------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on wind  (cost=474.17..28855.07 rows=75192 width=116)
   Recheck Cond: (measurement_date < '2000-01-01 00:00:00'::timestamp 
without time zone)
   ->  Bitmap Index Scan on wind_measurement_date_index  
(cost=0.00..474.17 rows=75192 width=0)
         Index Cond: (measurement_date < '2000-01-01 
00:00:00'::timestamp without time zone)
(4 rows)

Now, that looks like a nice plan (I guess).

My question: Are my partitions  constructed in such a way that prevents 
the planner from picking better plans? Or is that
the way things stand right now?

As always, thanks for your help.











Responses

pgsql-admin by date

Next:From: Colton SmithDate: 2005-11-24 19:06:08
Subject: pgstattuple, vacuum and free_space
Previous:From: Jaime CasanovaDate: 2005-11-24 15:47:18
Subject: Re: Postgres Database slow

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group