An index can be defined on more than one column of a table. For example, if you have a table of this form:
CREATE TABLE test2 ( major int, minor int, name varchar );
(say, you keep your /dev directory in a database...) and you frequently make queries like
SELECT name FROM test2 WHERE major = constant AND minor = constant;
then it may be appropriate to define an index on the columns major and minor together, e.g.,
CREATE INDEX test2_mm_idx ON test2 (major, minor);
Currently, only the B-tree and GiST index types support multicolumn indexes. Up to 32 columns may be specified. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)
A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns. Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned. A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.
Of course, each column must be used with operators appropriate to the index type; clauses that involve other operators will not be considered.
Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized. See also Section 11.4 for some discussion of the merits of different index setups.
[11:48] <marv> i read somewhere in a general sql optimizing book that putting the column you're selecting in a multicolumn index was a good idea as it saved a table lookup. i read somewhere else that postgresql won't make use of it, however that seemed to be an old mailing list post. Is that still true? What's recommended?
[11:50] <dennisb> in pg you always need to look up the tuple in the heap no matter if it is in an index or not
[11:51] <dennisb> so adding stuff to an index that you don't use to find the row and just have in the select list isn't a win in pg
[11:53] <dennisb> the reason is that in pg the visibility information isn't stored in the index, so it need to check that the tuple should be visible to you
[11:54] <dennisb> and that's only stored in the table heap (row storage area for the table)