Skip site navigation (1) Skip section navigation (2)

Re: Postgresql selecting strange index for simple query

From: Maxim Boguk <mboguk(at)masterhost(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql selecting strange index for simple query
Date: 2009-02-26 21:12:19
Message-ID: (view raw or flat)
Lists: pgsql-general
Tom Lane wrote:
> Maxim Boguk <mboguk(at)masterhost(dot)ru> writes:
>> So i have two theory (just waving hands ofcourse):
>> 1)integer owerflow somewhere in cost calculation
> Costs are floats, and in any case you're not showing costs anywhere near
> the integer overflow limit...
>> 2)floating rounding errors (because cost very close in wrong situations: 254918.19 vs 259709.09)
> The planner is intentionally set up to consider costs within a percent
> or so of each other as being effectively equal.  If the estimated costs
> are that close then it doesn't surprise me if it sometimes picks the
> "wrong" plan.  The real question is why are the estimates so close?
> They should not be, since AFAICS you are talking about a situation
> where we'd have to scan all of the multicol index versus only about
> a fifth of the single-col one.

Ok i exploring more:

just one thing:

hh=# SHOW default_statistics_target ;
(1 row)
(btw increase statistic to 1000 do not fix situation).

I try simplify test case and:
Now use sequential user_id, and truncate last_change_time to date:

SELECT nextval('test_seq') as user_id,last_change_time::date ,rpad('a',500,'b') as f1 into test_table from resume;
ANALYZE test_table;
SELECT count(*) from test_table;
CREATE INDEX right_idx on test_table(last_change_time);
CREATE INDEX wrong_idx on test_table(user_id, last_change_time);
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';

  Index Scan using wrong_idx on test_table  (cost=0.00..182623.51 rows=316522 width=0) (actual time=0.056..534.620 rows=382671 loops=1)
    Index Cond: (last_change_time > '2009-01-10'::date)

DROP INDEX wrong_idx;
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';

  Index Scan using right_idx on test_table  (cost=0.00..221765.19 rows=316522 width=0) (actual time=0.023..346.213 rows=382671 loops=1)
    Index Cond: (last_change_time > '2009-01-10'::date)

Full index scan over wrong index cost reasonable lower then 1/17 of single column index  (182623 vs 221765)!

So just last_change_time still cannot be generated... but:

hh=# SELECT count(distinct last_change_time) from test_table;
(1 row)

And statistic values for last_change_time is:

hh=# SELECT * from pg_stats where tablename='test_table' and attname='last_change_time';
-[ RECORD 1 ]-----+---------------------------------------------------------------------------------------------------------------------------
schemaname        | public
tablename         | test_table
attname           | last_change_time
null_frac         | 0
avg_width         | 4
n_distinct        | 1211
most_common_vals  | {2004-02-27,2009-01-26,2008-03-11,2009-01-27,2004-01-15,2008-10-13,2009-01-19,2009-01-22,2009-01-21,2008-09-29}
most_common_freqs | {0.00833333,0.00766667,0.00733333,0.007,0.00666667,0.00533333,0.00533333,0.00533333,0.005,0.00466667}
histogram_bounds  | {2003-03-31,2005-08-26,2006-10-04,2007-04-29,2007-09-27,2008-01-24,2008-05-01,2008-07-29,2008-10-01,2008-11-27,2009-01-25}
correlation       | 0.261512

I think it is all what planner can use when choose plan... because user_id is unique sequential values.

regargs, Maxim Boguk

In response to


pgsql-general by date

Next:From: Tim UckunDate: 2009-02-26 21:19:10
Subject: Re: speaking of 8.4...
Previous:From: Adrian KlaverDate: 2009-02-26 21:03:24
Subject: Re: Connection refused (0x0000274D/10061).

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group