Re: WIP: Covering + unique indexes.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Covering + unique indexes.
Date: 2015-10-09 15:17:55
Message-ID: 5617DAA3.6080408@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

08.10.2015 19:31, Thom Brown пишет:
> On 8 October 2015 at 16:18, Anastasia Lubennikova
> <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>> Hi hackers,
>>
>> I'm working on a patch that allows to combine covering and unique
>> functionality for btree indexes.
>>
>> Previous discussion was here:
>> 1) Proposal thread
>> 2) Message with proposal clarification
>>
>> In a nutshell, the feature allows to create index with "key" columns and
>> "included" columns.
>> "key" columns can be used as scan keys. Unique constraint relates only to
>> "key" columns.
>> "included" columns may be used as scan keys if they have suitable opclass.
>> Both "key" and "included" columns can be returned from index by
>> IndexOnlyScan.
>>
>> Btree is the default index and it's used everywhere. So it requires properly
>> testing. Volunteers are welcome)
>>
>> Use case:
>> - We have a table (c1, c2, c3, c4);
>> - We need to have an unique index on (c1, c2).
>> - We would like to have a covering index on all columns to avoid reading of
>> heap pages.
>>
>> Old way:
>> CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
>> CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);
>>
>> What's wrong?
>> Two indexes contain repeated data. Overhead to data manipulation operations
>> and database size.
>>
>> New way:
>> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
>>
>> The patch is attached.
>> In 'test.sql' you can find a test with detailed comments on each step, and
>> comparison of old and new indexes.
>>
>> New feature has following syntax:
>> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
>> Keyword INCLUDING defines the "included" columns of index. These columns
>> aren't concern to unique constraint.
>> Also, them are not stored in index inner pages. It allows to decrease index
>> size.
>>
>> Results:
>> 1) Additional covering index is not required anymore.
>> 2) New index can use IndexOnlyScan on queries, where old index can't.
>>
>> For example,
>> explain analyze select c1, c2 from newt where c1<10000 and c3<20;
>>
>> *more examples in 'test.sql'
>>
>> Future work:
>> To do opclasses for "included" columns optional.
>>
>> CREATE TABLE tbl (c1 int, c4 box);
>> CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);
>>
>> If we don't need c4 as an index scankey, we don't need any btree opclass on
>> it.
>> But we still want to have it in covering index for queries like
>>
>> SELECT c4 FROM tbl WHERE c1=1000;
>> SELECT * FROM tbl WHERE c1=1000;
> The definition output needs a space after "INCLUDING":
>
> # SELECT pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid);
> pg_get_indexdef
> --------------------------------------------------------------------------------------------------------------------------
> CREATE UNIQUE INDEX people_first_name_last_name_email_idx ON people
> USING btree (first_name, last_name) INCLUDING(email)
> (1 row)
>
>
> There is also no collation output:
>
> # CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB",
> last_name) INCLUDING (email COLLATE "pl_PL");
> CREATE INDEX
>
> # SELECT pg_get_indexdef('test_idx'::regclass::oid);
> pg_get_indexdef
> -------------------------------------------------------------------------------------------------------------
> CREATE UNIQUE INDEX test_idx ON people USING btree (first_name
> COLLATE "en_GB", last_name) INCLUDING(email)
> (1 row)
>
>
> As for functioning, it works as described:
>
> # EXPLAIN SELECT email FROM people WHERE (first_name,last_name) =
> ('Paul','Freeman');
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------
> Index Only Scan using people_first_name_last_name_email_idx on people
> (cost=0.28..1.40 rows=1 width=21)
> Index Cond: ((first_name = 'Paul'::text) AND (last_name = 'Freeman'::text))
> (2 rows)
>
>
> Typo:
>
> "included columns must not intersects with key columns"
>
> should be:
>
> "included columns must not intersect with key columns"

Thank you for testing. Mentioned issues are fixed.

> One thing I've noticed you can do with your patch, which you haven't
> mentioned, is have a non-unique covering index:
>
> # CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
> CREATE INDEX
>
> # EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul';
> QUERY PLAN
> ---------------------------------------------------------------------------------
> Index Only Scan using covering_idx on people (cost=0.28..1.44 rows=4 width=13)
> Index Cond: (first_name = 'Paul'::text)
> (2 rows)
>
> But this appears to behave as if it were a regular multi-column index,
> in that it will use the index for ordering rather than sort after
> fetching from the index. So is this really stored the same as a
> multi-column index? The index sizes aren't identical, so something is
> different.

Yes, it behaves as a regular multi-column index.
Index sizes are different, because included attributes are not stored in
index inner pages.
It allows to decrease index size. I don't sure that it doesn't decrease
search speed.
But I assumed that we are never execute search on included columns
without clause on key columns.
So it must be not too costly to recheck included attributes on leaf pages.

Furthermore, it's a first step of work on "optional oplasses for
included columns".
If attribute hasn't opclass, we certainly don't need to store it in
inner index page.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
covering_unique_2.0.patch text/x-patch 34.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-10-09 15:27:02 Re: Some questions about the array.
Previous Message Stephen Frost 2015-10-09 14:54:59 Re: Multi-tenancy with RLS