Bad Planner Statistics for Uneven distribution.

From: "Kevin McArthur" <Kevin(at)StormTide(dot)ca>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Bad Planner Statistics for Uneven distribution.
Date: 2006-07-21 18:15:28
Message-ID: 003b01c6acf1$a66d8910$0701a8c0@kdesktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I discussed this with a few members of #postgresql freenode this morning. I'll keep it breif; [note: i have cleaned out columns not relevant]

I have two tables, brands and models_brands. The first has about 300 records, the later about 350,000 records. The number of distinct brands in the models_brands table is < 10.

=# \d models_brands
Table "public.models_brands"
Column | Type | Modifiers
--------+-----------------------+-----------
model | integer | not null
brand | integer | not null
Indexes:
"models_brands_brand" btree (brand)
Foreign-key constraints:
"models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON UPDATE CASCADE ON DELETE CASCADE
"models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON UPDATE CASCADE ON DELETE CASCADE

a=# \d brands;
Table "public.brands"
Column | Type | Modifiers
------------+------------------------+-----------------------------------------------------------
brand_id | integer | not null default nextval('brands_brand_id_seq'::regclass)
brand_name | character varying(255) | not null
Indexes:
"brands_pkey" PRIMARY KEY, btree (brand_id)

Now the plans/problems..

=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 loops=1)
-> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 loops=1)
Sort Key: brand
-> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.040..1352.997 rows=369489 loops=1)
Total runtime: 6223.666 ms
(5 rows)

=# set enable_seqscan to off;
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 rows=369489 loops=1)
Total runtime: 2584.871 ms
(3 rows)

Picks the wrong plan here. Should pick the index with seqscanning enabled.

More (as a different wording/query)... (as suggested by others on irc)

=# set enable_seqscan to on;
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=0.00..30.09 rows=152 width=4) (actual time=7742.460..62567.543 rows=4 loops=1)
Filter: (subplan)
SubPlan
-> Seq Scan on models_brands (cost=0.00..7335.61 rows=92372 width=0) (actual time=206.467..206.467 rows=0 loops=303)
Filter: (brand = $0)
Total runtime: 62567.626 ms

a=# set enable_seqscan to off;
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=100000000.00..100000715.90 rows=152 width=4) (actual time=0.615..3.710 rows=4 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using models_brands_brand on models_brands (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008 rows=0 loops=303)
Index Cond: (brand = $0)
Total runtime: 3.790 ms

It was also tried to similar results with a LIMIT 1 in the subquery for exist.

More...

Seqscan still off..

=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 loops=1)
-> Merge Join (cost=0.00..866858.85 rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 loops=1)
Merge Cond: ("outer".brand_id = "inner".brand)
-> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.080..0.299 rows=60 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 loops=1)
Total runtime: 4898.697 ms

=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 rows=4 loops=1)
-> Merge Join (cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 rows=369489 loops=1)
Merge Cond: ("outer".brand_id = "inner".brand)
-> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.077..0.407 rows=60 loops=1)
-> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3741.584..5051.348 rows=369489 loops=1)
Sort Key: models_brands.brand
-> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.027..1346.178 rows=369489 loops=1)
Total runtime: 8589.502 ms
(8 rows)

Hope that helps

Kevin McArthur

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bruno Wolff III 2006-07-21 18:33:43 Re: BUG #2543: Performance delay acrros the same day
Previous Message Robert Lor 2006-07-21 15:11:58 Re: Sun Donated a Sun Fire T2000 to the PostgreSQL