This page in other versions: 9.0 / 9.1 / 9.2 / 9.3 / 9.4  |  Development versions: devel  |  Unsupported versions: 7.2 / 7.3 / 7.4 / 8.0 / 8.1 / 8.2 / 8.3 / 8.4

11.2. Index Types

PostgreSQL provides several index types: B-tree, R-tree, Hash, and GiST. Each index type uses a different algorithm that is best suited to different types of queries. By default, the CREATE INDEX command will create a B-tree index, which fits the most common situations.

B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators:

<
<=
=
>=
>
Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also be implemented with a B-tree index search. (But note that IS NULL is not equivalent to = and is not indexable.)

The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE, ILIKE, ~, and ~*, if the pattern is anchored to the beginning of the string, e.g., col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your server does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries. See Section 11.6 below.

R-tree indexes are suited for queries on spatial data. To create an R-tree index, use a command of the form

CREATE INDEX name ON table USING RTREE (column);

The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is involved in a comparison using one of these operators:

<<
&<
&>
>>
@
~=
&&
(See Section 9.10 for the meaning of these operators.)

Hash indexes can only handle simple equality comparisons. The query planner will consider using a hash index whenever an indexed column is involved in a comparison using the = operator. The following command is used to create a hash index:

CREATE INDEX name ON table USING HASH (column);

Note: Testing has shown PostgreSQL's hash indexes to perform no better than B-tree indexes, and the index size and build time for hash indexes is much worse. For these reasons, hash index use is presently discouraged.

GiST indexes are not a single kind of index, but rather an infrastructure within which many different indexing strategies can be implemented. Accordingly, the particular operators with which a GiST index can be used vary depending on the indexing strategy (the operator class). For more information see Chapter 48.

The B-tree index method is an implementation of Lehman-Yao high-concurrency B-trees. The R-tree index method implements standard R-trees using Guttman's quadratic split algorithm. The hash index method is an implementation of Litwin's linear hashing. We mention the algorithms used solely to indicate that all of these index methods are fully dynamic and do not have to be optimized periodically (as is the case with, for example, static hash methods).

Comments


May 10, 2005, 3:21 p.m.

Some info on when hash indexes might be better than B-trees (original email thread is at http://archives.postgresql.org/pgsql-general/2005-05/msg00444.php):

From the email:

It seems to me there is a window within which hash indexes are theoretically superior, but it might be pretty narrow. The basic allure of a hash index is that you look at the search key, do some allegedly-trivial computations, and go directly to the relevant index page(s); whereas a btree index requires descending through several upper levels of index pages to reach the target leaf page. On the other hand, once you reach the target index page, a hash index has no better method than linear scan through all the page's index entries to find the actually wanted key(s); in fact you have to search all the pages in that index bucket. A btree index can use binary search to find the relevant items within the page.

So it strikes me that important parameters include the index entry size and the number of entries matching any particular key value.

btree will probably win for smaller keys, on two grounds: it will have fewer tree levels to descend through, because of higher fan-out, and it will be much more efficient at finding the target entries within the target page when there are many entries per page. (As against this, it will have to work harder at each upper tree page to decide where to descend to --- but I think that's a second-order consideration.)

hash will tend to win as the number of duplicate keys increases, because its relative inefficiency at finding the matches within a particular bucket will become less significant. (The ideal situation for a hash index is to have only one actual key value per bucket. You can't really afford to store only one index entry per bucket, because of the sheer I/O volume that would result, so you need multiple entries that will all be responsive to your search.) (This also brings up the thought that it might be interesting to support hash buckets smaller than a page ... but I don't know how to make that work in an adaptive fashion.)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group