Re: WIP: Covering + unique indexes.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Andrey Borodin <amborodin(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Cc: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-08-15 14:45:58
Message-ID: f90aa60a-b67f-95b5-d9f5-f5d8ced178c6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

14.08.2016 20:11, Andrey Borodin:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, failed
> Spec compliant: tested, passed
> Documentation: tested, passed
>
> Hi hackers!
>
> I've read the patch and here is my code review.
>
> ==========PURPOSE============
> I've used this feature from time to time with MS SQL. From my experience INCLUDE is a 'sugar on top' feature.
> Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 (though classes do not mention lots of important things, so it's not kind of valuable indicator).
> But those who use it, use it whenever possible. For example, system view with recommended indices rarely list one without INCLUDE columns.
> So, this feature is very important from perspective of converting MS SQL DBAs to PostgreSQL. This is how I see it.

Thank you for the review, I hope this feature will be useful for many
people.

> ========SUGGESTIONS==========
> 0. Index build is broken. This script https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql SEGFAULTs and may cause situation when you cannot insert anything into table (I think drop of index would help, but didn't tested this)

Thank you for reporting. That was a bug caused by high key truncation,
that occurs
when index has more than 3 levels.
Fixed. See attached file.

> 1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a purpose listed above)

I've chosen this particular name to avoid using of new keyword. We
already have INCLUDING
in postgres in a context of inheritance that will never intersect with
covering indexes.
I'm sure it won't be a big problem of migration from MsSQL.

> 2. Empty line added in ruleutils.c. Is it for a reason?

No, just a missed line.
Fixed.

> 3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is worth considering renaming indnatts to something different from old name. Someone somewhere could still suppose it's a number of keys.

I agree that naming became vague after this patch.
I've already suggested to replace "indkeys[]" with more specific name, and
AFAIR there was no reaction. So I didn't do that.
But I don't sure about your suggestion regarding indnatts. Old queries
(and old indexes)
can still use it correctly. I don't see a reason to break compatibility
for all users.
Those who will use this new feature, should ensure that their queries to
pg_index
behave as expected.

> ========PERFORMANCE==========
> Due to suggestion number 0 I could not measure performance of index build. Index crashes when there's more than 1.1 million of rows in a table.
> Performance test script is here https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
> Test scenario is following:
> 1. Create table, then create index, then add data.
> 2. Make a query touching data in INCLUDING columns.
> This scenario is tested against table with:
> A. Table with index, that do not contain touched columns, just PK.
> B. Index with all columns in index.
> C. Index with PK in keys and INCLUDING all other columns.
>
> Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of RAM, SSD disk.
> Time to insert 10M rows:
> A. AVG 110 seconds STD 4.8
> B. AVG 121 seconds STD 2.0
> C. AVG 111 seconds STD 5.7
> Inserts to INCLUDING index is almost as fast as inserts to index without extra columns.
>
> Time to run SELECT query:
> A. AVG 2864 ms STD 794
> B. AVG 2329 ms STD 84
> C. AVG 2293 ms STD 58
> Selects with INCLUDING columns is almost as fast as with full index.
>
> Index size (deterministic measure, STD = 0)
> A. 317 MB
> B. 509 MB
> C. 399 MB
> Index size is in the middle between full index and minimal index.
>
> I think this numbers agree with expectation from the feature.
>
> ========CONCLUSION==========
> This patch brings useful and important feature. Build shall be repaired; other my suggestions are only suggestions.
>
>
>
> Best regards, Andrey Borodin, Octonica & Ural Federal University.
>
> The new status of this patch is: Waiting on Author
>

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
including_columns_9.7_v1.patch text/x-patch 135.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2016-08-15 14:56:34 Re: Bug in to_timestamp().
Previous Message Tom Lane 2016-08-15 14:19:12 Re: Why --backup-and-modify-in-place in perltidy config?