Re: WIP: Covering + unique indexes

From: Brad DeJong <Brad(dot)Dejong(at)infor(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "lubennikovaav(at)gmail(dot)com" <lubennikovaav(at)gmail(dot)com>
Subject: Re: WIP: Covering + unique indexes
Date: 2016-11-14 08:17:51
Message-ID: F8F0ED16CB59F247B7EFD0E1DB34BC1F5CB56977@USALWEXMBX3.infor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Anastasia, et al,

This is a review of including_columns_9.7_v5.patch.

I looked through the commit fest list and this patch was interesting and I wanted to try it.

I have used include columns on Microsoft SQL Server; DB2 for Linux, Unix, Windows; and DB2 for z/OS.

After reading the e-mail discussions, there are still points that I am not clear on.

Given "create table foo (a int, b int, c int, d int)" and "create unique index foo_a_b on foo (a, b) including (c)".

index only? heap tuple needed?
select a, b, c from foo where a = 1 yes no
select a, b, d from foo where a = 1 no yes
select a, b from foo where a = 1 and c = 1 ? ?

It was clear from the discussion that the index scan/access method would not resolve the "c = 1" predicate but it was not clear if "c = 1" could be resolved without accessing the heap tuple.

Are included columns counted against the 32 column and 2712 byte index limits? I did not see either explicitly mentioned in the discussion or the documentation. I only ask because in SQL Server the limits are different for include columns.

1. syntax - on 2016-08-14, Andrey Borodin wrote "I think MS SQL syntax INCLUDE instead of INCLUDING would be better". I would go further than that. This feature is already supported by 2 of the top 5 SQL databases and they both use INCLUDE. Using different syntax because of an internal implementation detail seems short sighted.

2. The patch does not seem to apply cleanly anymore.

> git checkout master
Already on 'master'
> git pull
From http://git.postgresql.org/git/postgresql
d49cc58..06f5fd2 master -> origin/master
Updating d49cc58..06f5fd2
Fast-forward
src/include/port.h | 2 +-
src/port/dirmod.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
> patch -pl < including_columns_9.7_v5.patch
patching file contrib/dblink/dblink.c
...
patching file src/backend/parser/gram.y
...
Hunk #2 FAILED at 375.
...
1 out of 12 hunks FAILED -- saving rejects to file src/backend/parser/gram.y.rej
...
patching file src/bin/pg_dump/pg_dump.c
...
Hunk #8 FAILED at 6399.
Hunk #9 FAILED at 6426.
...
2 out of 13 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej
...

3. After "fixing" compilation errors (best guess based on similar change in other location), "make check" failed.

> make check
...
parallel group (3 tests): index_including create_view create_index
create_index ... ok
create_view ... ok
index_including ... FAILED
...

Looking at regression.diffs, it looks like the output format of \d tbl changed (lines 288,300) from the expected "Column | Type | Modifiers" to "Column | Type | Collation | Nullable | Default".

4. documentation - minor items (these are not actual diffs)
create_index.sgml
- [ INCLUDING ( <replaceable class="parameter">column_name</replaceable> )]
+ [ INCLUDING ( <replaceable class="parameter">column_name</replaceable> [, ...] )]

An optional <literal>INCLUDING</> clause allows a list of columns to be
- specified which will be included in the index, in the non-key portion of the index.
+ specified which will be included in the non-key portion of the index.

The whole paragraph on INCLUDING does not include many of the points raised in the feature discussions.

create_table.sgml (for both UNIQUE and PRIMARY KEY constraints) (similar change could be made to nbtree/README)
- Optional clause <literal>INCLUDING</literal> allows to add into the index
- a portion of columns on which the uniqueness is not enforced upon.
- Note, that althogh constraint is not enforced on included columns, it still
- depends on them. Consequently, some operations on these columns (e.g. <literal>DROP COLUMN</literal>)
- can cause cascade constraint and index deletion.

+ An optional <literal>INCLUDING</literal> clause allows a list of columns
+ to be specified which will be included in the non-key portion of the index.
+ Although uniqueness is not enforced on the included columns, the constraint
+ still depends on them. Consequently, some operations on the included columns
+ (e.g. <literal>DROP COLUMN</literal>) can cause cascading constraint and index deletion.

indexcmds.c
- * are key columns, and which are just part of the INCLUDING list by check
+ * are key columns, and which are just part of the INCLUDING list by checking

ruleutils.c
- * meaningless there, so do not include them into the message.
+ * meaningless there, so do not include them in the message.

pg_dump.c (does "if (fout->remoteVersion >= 90600)" also need to change?)
- * In 9.6 we add INCLUDING columns functionality
- * that requires new fields to be added.
+ * In 10.0 we added INCLUDING columns functionality
+ * that required new fields to be added.

5. coding
parse_utilcmd.c
@@ -1334,6 +1334,38 @@ ...
The loop is handling included columns separately.
The loop adds the collation name for each included column if it is not the default.

Q: Given that the create index/create constraint syntax does not allow a collation to be specified for included columns, how can you ever have a non-default collation?

@@ -1776,6 +1816,7 @@
The comment here says "NOTE that exclusion constraints don't support included nonkey attributes". However, the paragraph on INCLUDING in create_index.sgml says "It's the same for the other constraints (PRIMARY KEY and EXCLUDE)".

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-11-14 08:39:36 Re: Quorum commit for multiple synchronous replication.
Previous Message Tsunakawa, Takayuki 2016-11-14 08:07:31 Re: Patch: Implement failover on libpq connect level.