Re: WIP: Covering + unique indexes.

From: Andrey Borodin <amborodin(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-08-14 17:11:31
Message-ID: 20160814171131.21390.75752.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, failed
Spec compliant: tested, passed
Documentation: tested, passed

Hi hackers!

I've read the patch and here is my code review.

==========PURPOSE============
I've used this feature from time to time with MS SQL. From my experience INCLUDE is a 'sugar on top' feature.
Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 (though classes do not mention lots of important things, so it's not kind of valuable indicator).
But those who use it, use it whenever possible. For example, system view with recommended indices rarely list one without INCLUDE columns.
So, this feature is very important from perspective of converting MS SQL DBAs to PostgreSQL. This is how I see it.

========SUGGESTIONS==========
0. Index build is broken. This script https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql SEGFAULTs and may cause situation when you cannot insert anything into table (I think drop of index would help, but didn't tested this)
1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a purpose listed above)
2. Empty line added in ruleutils.c. Is it for a reason?
3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is worth considering renaming indnatts to something different from old name. Someone somewhere could still suppose it's a number of keys.

========PERFORMANCE==========
Due to suggestion number 0 I could not measure performance of index build. Index crashes when there's more than 1.1 million of rows in a table.
Performance test script is here https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
Test scenario is following:
1. Create table, then create index, then add data.
2. Make a query touching data in INCLUDING columns.
This scenario is tested against table with:
A. Table with index, that do not contain touched columns, just PK.
B. Index with all columns in index.
C. Index with PK in keys and INCLUDING all other columns.

Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of RAM, SSD disk.
Time to insert 10M rows:
A. AVG 110 seconds STD 4.8
B. AVG 121 seconds STD 2.0
C. AVG 111 seconds STD 5.7
Inserts to INCLUDING index is almost as fast as inserts to index without extra columns.

Time to run SELECT query:
A. AVG 2864 ms STD 794
B. AVG 2329 ms STD 84
C. AVG 2293 ms STD 58
Selects with INCLUDING columns is almost as fast as with full index.

Index size (deterministic measure, STD = 0)
A. 317 MB
B. 509 MB
C. 399 MB
Index size is in the middle between full index and minimal index.

I think this numbers agree with expectation from the feature.

========CONCLUSION==========
This patch brings useful and important feature. Build shall be repaired; other my suggestions are only suggestions.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

The new status of this patch is: Waiting on Author

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-14 17:28:41 Re: numeric data type upper limit.
Previous Message Ryan Murphy 2016-08-14 15:12:45 Patch: initdb: "'" for QUOTE_PATH (non-windows)