Re: WIP: Covering + unique indexes.

From: Thom Brown <thom(at)linux(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Covering + unique indexes.
Date: 2015-10-08 16:31:29
Message-ID: CAA-aLv7mWdu5WCrUH0sw79abDsq4tdZEpkvV8SfgNyLp92+z6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 October 2015 at 16:18, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>
> Hi hackers,
>
> I'm working on a patch that allows to combine covering and unique
> functionality for btree indexes.
>
> Previous discussion was here:
> 1) Proposal thread
> 2) Message with proposal clarification
>
> In a nutshell, the feature allows to create index with "key" columns and
> "included" columns.
> "key" columns can be used as scan keys. Unique constraint relates only to
> "key" columns.
> "included" columns may be used as scan keys if they have suitable opclass.
> Both "key" and "included" columns can be returned from index by
> IndexOnlyScan.
>
> Btree is the default index and it's used everywhere. So it requires properly
> testing. Volunteers are welcome)
>
> Use case:
> - We have a table (c1, c2, c3, c4);
> - We need to have an unique index on (c1, c2).
> - We would like to have a covering index on all columns to avoid reading of
> heap pages.
>
> Old way:
> CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
> CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);
>
> What's wrong?
> Two indexes contain repeated data. Overhead to data manipulation operations
> and database size.
>
> New way:
> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
>
> The patch is attached.
> In 'test.sql' you can find a test with detailed comments on each step, and
> comparison of old and new indexes.
>
> New feature has following syntax:
> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
> Keyword INCLUDING defines the "included" columns of index. These columns
> aren't concern to unique constraint.
> Also, them are not stored in index inner pages. It allows to decrease index
> size.
>
> Results:
> 1) Additional covering index is not required anymore.
> 2) New index can use IndexOnlyScan on queries, where old index can't.
>
> For example,
> explain analyze select c1, c2 from newt where c1<10000 and c3<20;
>
> *more examples in 'test.sql'
>
> Future work:
> To do opclasses for "included" columns optional.
>
> CREATE TABLE tbl (c1 int, c4 box);
> CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);
>
> If we don't need c4 as an index scankey, we don't need any btree opclass on
> it.
> But we still want to have it in covering index for queries like
>
> SELECT c4 FROM tbl WHERE c1=1000;
> SELECT * FROM tbl WHERE c1=1000;

The definition output needs a space after "INCLUDING":

# SELECT pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid);
pg_get_indexdef
--------------------------------------------------------------------------------------------------------------------------
CREATE UNIQUE INDEX people_first_name_last_name_email_idx ON people
USING btree (first_name, last_name) INCLUDING(email)
(1 row)

There is also no collation output:

# CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB",
last_name) INCLUDING (email COLLATE "pl_PL");
CREATE INDEX

# SELECT pg_get_indexdef('test_idx'::regclass::oid);
pg_get_indexdef
-------------------------------------------------------------------------------------------------------------
CREATE UNIQUE INDEX test_idx ON people USING btree (first_name
COLLATE "en_GB", last_name) INCLUDING(email)
(1 row)

As for functioning, it works as described:

# EXPLAIN SELECT email FROM people WHERE (first_name,last_name) =
('Paul','Freeman');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Only Scan using people_first_name_last_name_email_idx on people
(cost=0.28..1.40 rows=1 width=21)
Index Cond: ((first_name = 'Paul'::text) AND (last_name = 'Freeman'::text))
(2 rows)

Typo:

"included columns must not intersects with key columns"

should be:

"included columns must not intersect with key columns"

One thing I've noticed you can do with your patch, which you haven't
mentioned, is have a non-unique covering index:

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
CREATE INDEX

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul';
QUERY PLAN
---------------------------------------------------------------------------------
Index Only Scan using covering_idx on people (cost=0.28..1.44 rows=4 width=13)
Index Cond: (first_name = 'Paul'::text)
(2 rows)

But this appears to behave as if it were a regular multi-column index,
in that it will use the index for ordering rather than sort after
fetching from the index. So is this really stored the same as a
multi-column index? The index sizes aren't identical, so something is
different.

Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-10-08 16:33:54 Re: [PATCH] minor doc patch
Previous Message Robert Haas 2015-10-08 16:27:40 Re: [PATCH] Comment fixes