Planner choice & tuning

From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Planner choice & tuning
Date: 2004-06-22 22:47:47
Message-ID: 200406221647.47507.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


The planner is choosing a sequential scan for my query. I am trying
to understand why since it is clearly not the fastest choice, and
what the proper tuning dial is to adjust here. Here's the query:

select count(1) from measurement where measure_id = 98;

TIA. Here are the details:

% psql -c "\d measurement"
Table "public.measurement"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('public.measurement_id_seq'::text)
host_id | integer | not null
measure_id | integer | not null
value | double precision | not null
duration | double precision | not null default 0.0
moment | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
source_id | integer | not null default 1
Indexes: measurement_duration_idx btree (duration),
measurement_host_idx btree (host_id),
measurement_measure_idx btree (measure_id),
measurement_moment_idx btree (moment),
measurement_source_idx btree (source_id),
measurement_value_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
$1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) ON UPDATE NO ACTION ON DELETE NO ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
count
---------
1800866
(1 row)

% psql -c "select count(1) from measurement where measure_id = 98"
count
-------
38862
(1 row)

% time psql -c "set enable_seqscan=no; explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 rows=1 loops=1)
-> Index Scan using measurement_measure_idx on measurement (cost=0.00..215062.64 rows=58408 width=0) (actual time=0.17..843.78 rows=38866 loops=1)
Index Cond: (measure_id = 98)
Total runtime: 904.77 msec
(4 rows)

real 0m1.298s
user 0m0.010s
sys 0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 rows=1 loops=1)
-> Seq Scan on measurement (cost=0.00..97711.07 rows=58408 width=0) (actual time=12.94..2430.08 rows=38866 loops=1)
Filter: (measure_id = 98)
Total runtime: 2494.11 msec
(4 rows)

real 0m2.885s
user 0m0.000s
sys 0m0.000s

This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,124,106,110,101,107,112,109} | {0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383333,0.0363333,0.034,0.0326667} | {23,36,39,43,85,89,100,111,120,122,128} | 0.232087
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
name | setting
------------------+---------
random_page_cost | 4
(1 row)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2004-06-22 22:49:05 Re: flock user defined function
Previous Message Martijn van Oosterhout 2004-06-22 22:41:38 Re: Connection gets into state where all queries fail