Re: Covering Indexes

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'David E(dot) Wheeler'" <david(at)justatheory(dot)com>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 16:41:47
Message-ID: 01e801cd643b$0f3a27d0$2dae7770$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of David E. Wheeler
> Sent: Tuesday, July 17, 2012 11:55 AM
> To: Simon Riggs
> Cc: Pg Hackers
> Subject: Re: [HACKERS] Covering Indexes
>
> On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
>
> > CREATE INDEX ON foo (a, b, c, d);
> >
> > allows
> >
> > SELECT c, d FROM foo WHERE a = ? AND b = ?
> >
> > to use an index only scan.
> >
> > The phrase "unindexed" seems misleading since the data is clearly in
> > the index from the description on the URL you gave. And since the
> > index is non-unique, I don't see any gap between Postgres and
> > SQLliite4.
>
> Yeah, but that index is unnecessarily big if one will never use c or d in
the
> search. The nice thing about covering indexes as described for SQLite 4
and
> implemented in MSSQL is that you can specify additional columns that just
> come along for the ride, but are not part of the indexed data:
>
> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> Yes, you can do that by also indexing c and d as of 9.2, but it might be
nice to
> be able to include them in the index as additional row data without
actually
> indexing them.
>
> Best,
>
> David

Concretely, I would presume that the contents of a covering index could then
look like the following (a,b,c,d):

(2,1,2,A)
(2,1,5,A) <-- the 5 is out of natural order but exists in the "covering"
part
(2,1,3,A)

Whereas PostgreSQL would be forced to have the index ordered as such:

(2,1,2,A)
(2,1,3,A)
(2,1,5,A)

Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
case could the data values be returned while strictly querying the index.

So the question that needs to be asked is what kind of performance increase
can be had during DML (insert/update) statements and whether those gains are
worth pursuing. Since these other engines appear to allow both cases you
should be able to get at least a partial idea of the performance gains
between "index (a,b,c,d)" and "index (a,b) covering (c,d)".

Vik's concurrent point regarding "non-indexable" values makes some sense but
the use case there seems specialized as I suspect that in the general case
values that are non-indexable (if there truly are any) are generally those
that would be too large to warrant sticking into an index in the first
place. But, XML values do ring true in my mind (particularly frequently
used fragments that are generally quite small). But again whether that is a
reasonable use case for a covering index I do not know. It feels like
trying to solve the remaining 10% when it took a long while to even muster
up enough support and resources to solve the 90%.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-07-17 17:00:37 Re: Covering Indexes
Previous Message Vik Reykja 2012-07-17 16:19:43 Re: Covering Indexes