Although indexes in PostgreSQL do not need maintenance or tuning, it is still important to check which indexes are actually used by the real-life query workload. Examining index usage for an individual query is done with the EXPLAIN command; its application for this purpose is illustrated in Section 14.1. It is also possible to gather overall statistics about index usage in a running server, as described in Section 28.2.
It is difficult to formulate a general procedure for determining which indexes to create. There are a number of typical cases that have been shown in the examples throughout the previous sections. A good deal of experimentation is often necessary. The rest of this section gives some tips for that:
Always run ANALYZE first. This command
collects statistics about the distribution of the values in
the table. This information is required to estimate the
number of rows returned by a query, which is needed by the
planner to assign realistic costs to each possible query
plan. In absence of any real statistics, some default
values are assumed, which are almost certain to be
inaccurate. Examining an application's index usage without
therefore a lost cause. See Section 24.1.3
and Section 24.1.6
for more information.
Use real data for experimentation. Using test data for setting up indexes will tell you what indexes you need for the test data, but that is all.
It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows probably fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page.
Also be careful when making up test data, which is often unavoidable when the application is not yet in production. Values that are very similar, completely random, or inserted in sorted order will skew the statistics away from the distribution that real data would have.
When indexes are not used, it can be useful for testing
to force their use. There are run-time parameters that can
turn off various plan types (see Section 19.7.1).
For instance, turning off sequential scans (
enable_seqscan) and nested-loop joins
enable_nestloop), which are
the most basic plans, will force the system to use a
different plan. If the system still chooses a sequential
scan or nested-loop join then there is probably a more
fundamental reason why the index is not being used; for
example, the query condition does not match the index.
(What kind of query can use what kind of index is explained
in the previous sections.)
If forcing index usage does use the index, then there
are two possibilities: Either the system is right and using
the index is indeed not appropriate, or the cost estimates
of the query plans are not reflecting reality. So you
should time your query with and without indexes. The
EXPLAIN ANALYZE command can be
If it turns out that the cost estimates are wrong, there are, again, two possibilities. The total cost is computed from the per-row costs of each plan node times the selectivity estimate of the plan node. The costs estimated for the plan nodes can be adjusted via run-time parameters (described in Section 19.7.2). An inaccurate selectivity estimate is due to insufficient statistics. It might be possible to improve this by tuning the statistics-gathering parameters (see ALTER TABLE).
If you do not succeed in adjusting the costs to be more appropriate, then you might have to resort to forcing index usage explicitly. You might also want to contact the PostgreSQL developers to examine the issue.
If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.