Re: Covering Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-26 17:17:55
Message-ID: CA+TgmoayDW03EKXExion72mgLrTTRGLRR3zo2XbAR=EL6_5Gpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 26, 2012 at 12:25 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
>>> > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
>>> > case could the data values be returned while strictly querying the index.
>>> >
>>> > So the question that needs to be asked is what kind of performance increase
>>> > can be had during DML (insert/update) statements and whether those gains are
>>> > worth pursuing. Since these other engines appear to allow both cases you
>>> > should be able to get at least a partial idea of the performance gains
>>> > between "index (a,b,c,d)" and "index (a,b) covering (c,d)".
>>>
>>> There is a use case, already discussed, whereby that is useful for
>>> create unique index on foo (a,b) covering (c,d)
>>>
>>> but there really isn't any functional difference between
>>> create index on foo (a,b) covering (c,d)
>>>
>>> and
>>> create index on foo (a,b,c,d)
>>>
>>> There is a potential performance impact. But as Tom says, that might
>>> even be negative if it is actually measurable.
>>
>> So, do we want a TODO item about adding columns to a unique index that
>> will not be used for uniqueness checks?
>
> I think so. The case where you want a wide multiple column primary
> key to be extended to cover that one extra commonly grabbed value is
> not super common but entirely plausible. With the existing
> infrastructure to get the advantages of index covering you'd have to
> duplicate the entire index for the extra field which really sucks:
> aside from the huge waste of time and space, you force the planner to
> play the 'which index do i use?' game.

I think it is going to take several years before we really understand
how index-only scans play out in the real world, and what factors
limit their usefulness. This one has come up a few times because it's
sort of an obvious thing to want to do and we don't have it, but I
think that there's room for some skepticism about how well it will
actually work, for reasons that have already been mentioned, and of
course also because indexing more columns potentially means defeating
HOT, which I suspect will defeat many otherwise-promising applications
of index-only scans.

That having been said, it would be unwise to speculate too much in
advance of the data, and we're not going to get any data until someone
tries implementing it, so +1 from me for putting something on the TODO
list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-07-26 17:24:19 Re: Using pg_upgrade on log-shipping standby servers
Previous Message Merlin Moncure 2012-07-26 16:25:06 Re: Covering Indexes