Re: Poor index choice -- multiple indexes of the same

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: josh(at)agliodbs(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Poor index choice -- multiple indexes of the same
Date: 2005-06-28 02:36:51
Message-ID: 1119926211l.11411l.2l@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 06/27/2005 05:37:41 PM, Josh Berkus wrote:
> Karl,
>
> > Seems to me that when there's a constant value in the query
> > and an = comparision it will always be faster to use the (b-tree)
> > index that's ordered first by the constant value, as then all
> further
> > blocks are guarenteed to have a higher relevant information
> > density. At least when compared with another index that has the
> > same columns in it.
>
> That really depends on the stats. Such a choice would *not* be
> appropriate if the < comparison was expected to return 1- rows while
> the =
> condition applied to 15% of the table.

We're talking internals here so I don't know what I'm talking
about, but, when the = comparison returns 15% of the table
you can find your way straight to the 1- (sic) relevent rows
because that 15% is further sorted by the second column of the
index. So that's one disk read and after that when you scan
the rest of the blocks every datum is relevant/returned.
So your scan will pass through fewer disk blocks. The only
case that would make sense to consider using the other
index is if the planner knew it could
get the answer in 1 disk read, in which case it should be
able to get the answer out of either index in one disk read
as both indexes are on the same columns.

> What is your STATISTICS_TARGET for the relevant columns set to?

STATISTICS_TARGET is the default, which I read as 10 the docs.

> When's
> the last time you ran analyze?

I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.

Just for grins I've created the index I'd like it to use
and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below.

> If this is all updated, you want to
> post
> the pg_stats rows for the relevant columns?

Pg_stats rows below. (I've tried to wrap the lines short
so as not to mess up anybody's mailer.)

# create index census_sname_date on census (sname, date);
CREATE INDEX
# vacuum analyze census;
VACUUM
# explain analyze select 1 from census where date < '1975-9-21'
and sname = 'RAD' and status != 'A' ;
QUERY
PLAN
---------------------------------------------------------------
---------------------------------------------------------------
----
Index Scan using census_date_sname on census (cost=0.00..2169.51
rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1)
Index Cond: ((date < '1975-09-21'::date) AND (sname =
'RAD'::bpchar))
Filter: (status <> 'A'::bpchar)
Total runtime: 40.652 ms
(4 rows)

Compare with:

# drop index census_date_sname;
DROP INDEX
# explain analyze select date from census where sname = 'RAD'
and date < '1975-9-21' and status != 'A' limit 1;
QUERY
PLAN
-------------------------------------------------------------------
-------------------------------------------------------------------
Limit (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097
rows=0 loops=1)
-> Index Scan using census_sname_date on census
(cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094
rows=0 loops=1)
Index Cond: ((sname = 'RAD'::bpchar) AND (date <
'1975-09-21'::date))
Filter: (status <> 'A'::bpchar)
Total runtime: 0.133 ms
(5 rows)

# select * from pg_stats where tablename = 'census' and (attname =
'sname' or attname = 'date');
schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals | most_common_freqs | histogram_bounds |
correlation
------------+-----------+---------+-----------+-----------+-----------
-+--------------------------------------------------------------------
---------------------------------------------+------------------------
----------------------------------------------------------------------
--------------+-------------------------------------------------------
---------------------------------------------------------------------+
-------------
babase | census | date | 0 | 4 | 4687 |
{1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28
,1972-04-28,1972-08-27,1974-04-06,1975-03-19}
|
{0.002,0.00166667,0.00166667,0.00166667,0.00166667,0.00133333
,0.00133333,0.00133333,0.00133333,0.00133333}
|
{1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09
,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02}
| 1
babase | census | sname | 0 | 7 | 177 |
{MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} |
{0.0166667,0.015,0.015,0.0146667
,0.0143333,0.014,0.0136667,0.0136667,0.0133333,0.0133333}
| {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897
(2 rows)

Thanks.

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Mitchell 2005-06-28 02:37:34 How can I speed up this function?
Previous Message Karl O. Pinc 2005-06-28 01:54:08 Re: Performance analysis of plpgsql code