Re: Planner choice & tuning

From: "Scott Marlowe" <smarlowe(at)qwest(dot)net>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Planner choice & tuning
Date: 2004-06-22 23:24:24
Message-ID: 1087946664.1187.962.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2004-06-22 at 16:47, Ed L. wrote:
> The planner is choosing a sequential scan for my query. I am trying
> to understand why since it is clearly not the fastest choice, and
> what the proper tuning dial is to adjust here. Here's the query:
>
> select count(1) from measurement where measure_id = 98;
>
> TIA. Here are the details:
>
> % psql -c "\d measurement"
> Table "public.measurement"
> Column | Type | Modifiers
> ------------+--------------------------+-------------------------------------------------------------
> id | integer | not null default nextval('public.measurement_id_seq'::text)
> host_id | integer | not null
> measure_id | integer | not null
> value | double precision | not null
> duration | double precision | not null default 0.0
> moment | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
> source_id | integer | not null default 1
> Indexes: measurement_duration_idx btree (duration),
> measurement_host_idx btree (host_id),
> measurement_measure_idx btree (measure_id),
> measurement_moment_idx btree (moment),
> measurement_source_idx btree (source_id),
> measurement_value_idx btree (value)
> Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
> $1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) ON UPDATE NO ACTION ON DELETE NO ACTION
>
> % psql -c "analyze measurement"
> ANALYZE
>
> % psql -c "select count(1) from measurement"
> count
> ---------
> 1800866
> (1 row)
>
>
> % psql -c "select count(1) from measurement where measure_id = 98"
> count
> -------
> 38862
> (1 row)
>
> % time psql -c "set enable_seqscan=no; explain analyze select count(1) from measurement where measure_id = 98"
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 rows=1 loops=1)
> -> Index Scan using measurement_measure_idx on measurement (cost=0.00..215062.64 rows=58408 width=0) (actual time=0.17..843.78 rows=38866 loops=1)
> Index Cond: (measure_id = 98)
> Total runtime: 904.77 msec
> (4 rows)
>
>
> real 0m1.298s
> user 0m0.010s
> sys 0m0.000s
>
> % time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 rows=1 loops=1)
> -> Seq Scan on measurement (cost=0.00..97711.07 rows=58408 width=0) (actual time=12.94..2430.08 rows=38866 loops=1)
> Filter: (measure_id = 98)
> Total runtime: 2494.11 msec
> (4 rows)
>
>
> real 0m2.885s
> user 0m0.000s
> sys 0m0.000s
>
>
> This seems to be saying the planner thinks its less expensive to do the
> sequential scan, but why?
>
> Including pg_stats data in case it is relevant here.
>
> % psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
> schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
> ------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
> public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,124,106,110,101,107,112,109} | {0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383333,0.0363333,0.034,0.0326667} | {23,36,39,43,85,89,100,111,120,122,128} | 0.232087
> (1 row)
>
> % psql -c "select name, setting from pg_settings where name like 'random%'"
> name | setting
> ------------------+---------
> random_page_cost | 4
> (1 row)

I'd say your random_page_cost is too low for your setup. While there's
a slight misguess on the planner in the number of rows (38k verus 58k)
it's not that big. Try values between 1.2 and 2.0. Most larger servers
with plenty of memory work well around 1.3 to 1.5. Also, make sure your
effective_cache_size is properly set.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2004-06-22 23:30:16 Re: More psql problems... >.<
Previous Message Tom Lane 2004-06-22 23:04:35 Re: Connection gets into state where all queries fail