Re: Index on two columns not used

From: Markus Schaber <schabi(at)logix-tt(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Péter Kovács <peter(dot)kovacs(at)chemaxon(dot)hu>
Subject: Re: Index on two columns not used
Date: 2006-10-23 13:25:49
Message-ID: 453CC2DD.4070608@logix-tt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, Peter,

Péter Kovács wrote:
> Sorry for the amateurish question, but what are "heap tuples"?
>
> Also, my understanding is that the following statement applies only for
> composite indexes: "PostgreSQL can't use the values stored in the index
> to check the join condition". I assume that PostgreSQL will be able to
> use single-column-indexes for join conditions. Is this correct?

Both questions are tightly related:

First, the "heap" is the part of the table where the actual tuples are
stored.

PostgreSQL uses an MVCC system, that means that multiple versions (with
their transaction information) of a single row can coexist in the heap.
This allows for higher concurrency in the backend.

Now, the index basically stores pointers like "pages 23 and 42 contain
rows with value 'foo'", but version information is not replicated to the
index pages, this keeps the index' size requirements low.

Additionally, in most UPDATE cases, the new row version will fit into
the same page as the old version. In this case, the index does not have
to be changed, which is an additional speed improvement.

But when accessing the data via the index, it can only give a
preselection of pages that contain interesting data, and PostgreSQL has
to look into the actual heap pages to check whether there really are row
versions that are visible in the current transaction.

A further problem is that some GIST index types are lossy, that means
the index does not retain the full information, but only an
approximation, for efficiency reasons.

A prominent example are the PostGIS geometry indices, they only store
the bounding box (4 float values) instead of the whole geometry (may be
millions of double precision coordinates). So it may be necessary to
re-check the condition with the real data, using the lossy index for
preselection.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2006-10-23 13:50:11 Re: Index on two columns not used
Previous Message John Philips 2006-10-23 12:16:37 Re: Optimizing disk throughput on quad Opteron