Help with unpredictable use of indexes on large tables...

From: Mike Leahy <mgleahy(at)alumni(dot)uwaterloo(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Help with unpredictable use of indexes on large tables...
Date: 2006-04-27 23:03:28
Message-ID: 44514DC0.1030601@alumni.uwaterloo.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello list,

I've been having a bit of difficulty getting Postgres to use indexes on
some large tables that I have. Included below are the results from
'explain analyze' for two queries that should get the unique years of
data that are available from two different tables (tbl_ind_schools_edu
and tbl_ind_schools_con). The *_edu table has two years of data in it
while the *_con table has one year, so *_edu essentially has nearly
twice as many rows (i.e., 55k vs. 26k). There is an integer column
called "year" in each table that flags what year each row of data is
from, and a btree index on this column for both tables (called
"schoolse_year" and "schoolsc_year" respectively). Both tables have
been fully vacuumed/analyzed. When I try to get the distinct number of
years from these tables, it does a sequential scan to get two unique
values from the "year" column in the *_edu table, but it uses an index
scan to get a single unique value from the "year" column from the *_con
table. In both cases, I would have expected the index scan to be used.
I also tried this using the year column as group by clause, but in that
case, neither of the queries use the index scan.

I know I can force PostgreSQL to use indexes by setting enable_seqscan
to off, and this does improve the performance of the query. But I'm
wondering why the query analyzer doesn't use this index on the larger
table. I have several other tables of a similar nature (basically the
same data aggregated at different levels), where one table has two years
of data, and the other has one. In several cases, the table with two
years never utilizes its index on the year column unless I force it to
do so. I should point out that the table with two years of data also
has a much larger number of columns, all with indexes since they are all
potentially used for querying subsets from the tables. Is there
something particularly wrong that I might doing (or something that I'm
not doing) to prevent the indexes from being properly used?

Thanks in advance for any advice...
Mike

========================================================================

dbname=# explain analyze select distinct year from tbl_ind_schools_edu;
QUERY PLAN

------------------------------------------------------------------------
Unique (cost=32302.16..32579.31 rows=2 width=2) (actual
time=1545.911..2084.170 rows=2 loops=1)
-> Sort (cost=32302.16..32440.74 rows=55431 width=2) (actual
time=1545.901..1892.026 rows=55431 loops=1)
Sort Key: "year"
-> Seq Scan on tbl_ind_schools_edu (cost=0.00..27485.31
rows=55431 width=2) (actual time=0.074..1180.303 rows=55431 loops=1)
Total runtime: 2085.294 ms
(5 rows)

dbname=# explain analyze select distinct year from tbl_ind_schools_con;
QUERY PLAN

------------------------------------------------------------------------
Unique (cost=0.00..954.37 rows=1 width=2) (actual time=76.277..372.526
rows=1 loops=1)
-> Index Scan using schoolsc_year on tbl_ind_schools_con
(cost=0.00..887.08 rows=26916 width=2) (actual time=76.265..275.314
rows=26916 loops=1)
Total runtime: 372.659 ms
(3 rows)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-04-27 23:21:55 Re: Help with unpredictable use of indexes on large tables...
Previous Message Tom Lane 2006-04-27 21:54:40 Re: trying to write a many-to-many type function.