No heap lookups on index

From: David Scott <davids(at)apptechsys(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: No heap lookups on index
Date: 2006-01-18 20:14:12
Message-ID: 43CEA194.8020105@apptechsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Allow me a brief introduction. I work in a company who contracts
intelligence analysis software to the government. We are currently
developing a product which is using PostgreSQL at it's core. Due to the
licensing of the product and the integration with perl this is our first
choice in database solutions.

We are, however, currently stuck. We are storing millions of rows and
require very high query performance. We have spent the last several
months tweaking, list lurking and researching all the various tweaks and
performance enhancements and have come to the conclusion that our
biggest slowdown is validating the index rows which match our selection
criteria against the heap values. In general cases there is a very
small amount required for this, but in our extreme use cases we are
finding this to slow our queries by an unacceptable amount of time.

We would like to resolve this issue. In that endeavor we have done some
feasibility analysis (either to write a patch ourselves or attempt to
commission an expert to do so), starting with the archives for this
list. We found several posts discussing the issue and it seems that the
complexity of storing the tuple visibility information inside of the
index rows is prohibitive for simple indexes.

I have used SQL Server in the past and have noticed that bookmark
lookups are avoided because they force the query executor to actually
fetch the data page off of disk, rather then return the values that
exist in the index. I have verified times against the PostgreSQL
installation and SQL Server to verify that the SQL Server queries come
back at roughly the same speed when avoiding bookmark lookups as
Postgres queries accessing clustered tables using the index the table is
clustered on.

Since I am sure everyone is tired of the intro by now, I'll get to the
questions:
Do commercial databases implement MVCC in a way that allows an
efficient implementation of index lookups that can avoid heap lookups?
Is there any way to modify PostgreSQL to allow index lookups without
heap validation that doesn't involve re-writing the MVCC implementation
of keeping dead rows on the live table?
Is the additional overhead of keeping full tuple visibility
information inside of the index so odious to the Postgres community as
to prevent a patch with this solution from being applied back to the
head? Maybe as an optional use feature? We would prefer this solution
for our needs over the bitmap of heap pages listed in the TODO list
because we want to ensure optimal query times, regardless of the state
of the cache and because we are concerned with performance in the face
of concurrent updates on the page level.

Thanks for any thoughts on this, I know this is a perennial topic, but
we are seriously considering contributing either code or money to the
solution of this problem.

David Scott
Applied Technical Systems, Inc.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-01-18 20:50:43 Re: No heap lookups on index
Previous Message Alex Turner 2006-01-18 19:43:36 Re: Postgres Wierdness

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-01-18 20:50:43 Re: No heap lookups on index
Previous Message Jaime Casanova 2006-01-18 19:21:34 log_min_messages and debug levels