Re: Postgresql selecting strange index for simple query

From: Maxim Boguk <mboguk(at)masterhost(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql selecting strange index for simple query
Date: 2009-02-26 21:12:19
Message-ID: 49A705B3.9080408@masterhost.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Maxim Boguk <mboguk(at)masterhost(dot)ru> writes:
>> So i have two theory (just waving hands ofcourse):
>> 1)integer owerflow somewhere in cost calculation
>
> Costs are floats, and in any case you're not showing costs anywhere near
> the integer overflow limit...
>
>> 2)floating rounding errors (because cost very close in wrong situations: 254918.19 vs 259709.09)
>
> The planner is intentionally set up to consider costs within a percent
> or so of each other as being effectively equal. If the estimated costs
> are that close then it doesn't surprise me if it sometimes picks the
> "wrong" plan. The real question is why are the estimates so close?
> They should not be, since AFAICS you are talking about a situation
> where we'd have to scan all of the multicol index versus only about
> a fifth of the single-col one.

Ok i exploring more:

just one thing:

hh=# SHOW default_statistics_target ;
default_statistics_target
---------------------------
10
(1 row)
(btw increase statistic to 1000 do not fix situation).

I try simplify test case and:
Now use sequential user_id, and truncate last_change_time to date:

SELECT nextval('test_seq') as user_id,last_change_time::date ,rpad('a',500,'b') as f1 into test_table from resume;
ANALYZE test_table;
SELECT count(*) from test_table;
CREATE INDEX right_idx on test_table(last_change_time);
CREATE INDEX wrong_idx on test_table(user_id, last_change_time);
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';

Index Scan using wrong_idx on test_table (cost=0.00..182623.51 rows=316522 width=0) (actual time=0.056..534.620 rows=382671 loops=1)
Index Cond: (last_change_time > '2009-01-10'::date)

DROP INDEX wrong_idx;
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';

Index Scan using right_idx on test_table (cost=0.00..221765.19 rows=316522 width=0) (actual time=0.023..346.213 rows=382671 loops=1)
Index Cond: (last_change_time > '2009-01-10'::date)

Full index scan over wrong index cost reasonable lower then 1/17 of single column index (182623 vs 221765)!

So just last_change_time still cannot be generated... but:

hh=# SELECT count(distinct last_change_time) from test_table;
count
-------
2133
(1 row)

And statistic values for last_change_time is:

hh=# SELECT * from pg_stats where tablename='test_table' and attname='last_change_time';
-[ RECORD 1 ]-----+---------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | test_table
attname | last_change_time
null_frac | 0
avg_width | 4
n_distinct | 1211
most_common_vals | {2004-02-27,2009-01-26,2008-03-11,2009-01-27,2004-01-15,2008-10-13,2009-01-19,2009-01-22,2009-01-21,2008-09-29}
most_common_freqs | {0.00833333,0.00766667,0.00733333,0.007,0.00666667,0.00533333,0.00533333,0.00533333,0.005,0.00466667}
histogram_bounds | {2003-03-31,2005-08-26,2006-10-04,2007-04-29,2007-09-27,2008-01-24,2008-05-01,2008-07-29,2008-10-01,2008-11-27,2009-01-25}
correlation | 0.261512

I think it is all what planner can use when choose plan... because user_id is unique sequential values.

regargs, Maxim Boguk

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Uckun 2009-02-26 21:19:10 Re: speaking of 8.4...
Previous Message Adrian Klaver 2009-02-26 21:03:24 Re: Connection refused (0x0000274D/10061).