Re: WIP: Covering + unique indexes.

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-01-04 08:49:39
Message-ID: CAKJS1f_oR+P_c+u8PtdEFyWPi4NP-vt69sQGYXTv9NQ48U5JiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 December 2015 at 01:53, Anastasia Lubennikova <
a(dot)lubennikova(at)postgrespro(dot)ru> wrote:

> Finally, completed patch "covering_unique_3.0.patch" is here.
> It includes the functionality discussed above in the thread, regression
> tests and docs update.
> I think it's quite ready for review.
>

Hi Anastasia,

I've maybe mentioned before that I think this is a great feature and I
think it will be very useful to have, so I've signed up to review the
patch, and below is the results of my first pass from reading the code.
Apologies if some of the things seem like nitpicks, I've basically just
listed everything I've noticed during, no matter how small.

- int natts = rel->rd_rel->relnatts;
+ int nkeyatts = rel->rd_index->indnkeyatts;
+
+ Assert (rel->rd_index != NULL);
+ Assert(rel->rd_index->indnatts != 0);
+ Assert(rel->rd_index->indnkeyatts != 0);
+
SnapshotData SnapshotDirty;

There's a couple of problems here. According to [1] the C code must follow
the C89 standard, but this appears not to. You have some statements before
the final variable declaration, and also there's a problem as you're
Asserting that rel->rd_index != NULL after already trying to dereference it
in the assignment to nkeyatts, which makes the Assert() useless.

+ An access method that supports this feature sets
<structname>pg_am</>.<structfield>amcanincluding</> true.

I don't think this belongs under the "Index Uniqueness Checks" title. I
think the "Columns included with clause INCLUDING aren't used to enforce
uniqueness." that you've added before it is a good idea, but perhaps the
details of amcanincluding are best explained elsewhere.

- indexed columns are equal in multiple rows.
+ indexed columns are equal in multiple rows. Columns included with clause
+ INCLUDING aren't used to enforce constraints (UNIQUE, PRIMARY KEY,
etc).

<literal> is missing around "INCLUDING" here. Perhaps this part needs more
explanation in a new paragraph. Likely it's good idea to also inform the
reader that the columns which are part of the INCLUDING clause exist only
to allow the query planner to skip having to perform a lookup to the heap
when all of the columns required for the relation are present in the
indexed columns, or in the INCLUDING columns. I think you should explain
that the index can also only be used as pre-sorted input for columns which
are in the "indexed columns" part of the index, and the INCLUDING column
are not searchable as index quals.

--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -137,7 +137,6 @@ CheckIndexCompatible(Oid oldId,
Relation irel;
int i;
Datum d;
-
/* Caller should already have the relation locked in some way. */

You've accidentally removed an empty line here.

+ /*
+ * All information about key and included cols is in numberOfKeyAttributes
number.
+ * So we can concat all index params into one list.
+ */
+ stmt->indexParams = list_concat(stmt->indexParams,
stmt->indexIncludingParams);

I think this should be explained with a better comment, perhaps:

/*
* We append any INCLUDING columns onto the indexParams list so that
* we have one list with all columns. Later we can determine which of these
* are indexed, and which are just part of the INCLUDING list by check the
list
* position. A list item in a position less than ii_NumIndexKeyAttrs is
part of
* the indexed columns, and anything equal to and over is part of the
* INCLUDING columns.
*/

+ stack = _bt_search(rel, IndexRelationGetNumberOfKeyAttributes(rel),
itup_scankey,

This line is longer than 80 chars.

+ /* Truncate nonkey attributes when inserting on nonleaf pages */
+ if (wstate->index->rd_index->indnatts !=
wstate->index->rd_index->indnkeyatts)
+ {
+ BTPageOpaque pageop = (BTPageOpaque) PageGetSpecialPointer(npage);
+
+ if (!P_ISLEAF(pageop))
+ {
+ itup = index_reform_tuple(wstate->index, itup,
wstate->index->rd_index->indnatts, wstate->index->rd_index->indnkeyatts);
+ itupsz = IndexTupleDSize(*itup);
+ itupsz = MAXALIGN(itupsz);
+ }
+ }

A few of the lines here are over 80 chars.

+ This clause specifies additional columns to be appended to the set
of index columns.
+ Included columns don't support any constraints <literal>(UNIQUE,
PRMARY KEY, EXCLUSION CONSTRAINT)</>.
+ These columns can improve the performance of some queries through
using advantages of index-only scan
+ (Or so called <firstterm>covering</firstterm> indexes. Covering
index is the index that
+ covers all columns required in the query and prevents a table
access).
+ Besides that, included attributes are not stored in index inner
pages.
+ It allows to decrease index size and furthermore it provides a way
to extend included
+ columns to store atttributes without suitable opclass (not
implemented yet).
+ This clause could be applied to both unique and nonunique indexes.
+ It's possible to have non-unique covering index, which behaves as
a regular
+ multi-column index with a bit smaller index-size.
+ Currently, only the B-tree access method supports this feature.

"PRMARY KEY" should be "PRIMARY KEY". I ended up rewriting this paragraph
as follows.

"An optional <literal>INCLUDING</> clause allows a list of columns to be
specified which will be included in the index, in the non-key portion of
the index. Columns which are part of this clause cannot also exist in the
indexed columns portion of the index, and vice versa. The
<literal>INCLUDING</> columns exist solely to allow more queries to benefit
from <firstterm>index only scans</> by including certain columns in the
index, the value of which would otherwise have to be obtained by reading
the table's heap. Having these columns in the <literal>INCLUDING</> clause
in some cases allows <productname>PostgreSQL</> to skip the heap read
completely. This also allows <literal>UNIQUE</> indexes to be defined on
one set of columns, which can include another set of column in the
<literal>INCLUDING</> clause, on which the uniqueness is not enforced upon.
This can also be useful for non-unique indexes as any columns which are not
required for the searching or ordering of records can defined in the
<literal>INCLUDING</> clause, which can often reduce the size of the index."

Maybe not perfect, but maybe it's an improvement?

+ To create an unique B-tree index on the column <literal>title</literal>
in

and

+ To create an unique B-tree index on the column <literal>title</literal>

Although "unique" starts with a vowel, "an" is not correct here: This is
best explained in someone else's words:

"The choice between a and an is governed not by whether the next written
letter is a consonant or vowel but by whether the next word begins with the
sound of a vowel or consonant. Unique begins with a "y" sound, hence a
unique is correct."

- int natts = rel->rd_rel->relnatts;
+ int nkeyatts = rel->rd_rel->relnatts;
ScanKey itup_scankey;
BTStack stack;
Buffer buf;
OffsetNumber offset;

+ Assert (rel->rd_index != NULL);
+ Assert(rel->rd_index->indnatts != 0);
+ Assert(rel->rd_index->indnkeyatts != 0);
+ nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+

nkeyatts is assigned twice.

+ /* Truncate nonkey attributes when inserting on nonleaf pages. */
+ if (rel->rd_index->indnatts != rel->rd_index->indnkeyatts)
+ if (!P_ISLEAF(lpageop))
+ itup = index_reform_tuple(rel, itup, rel->rd_index->indnatts,
rel->rd_index->indnkeyatts);

I don't recall having seen any places in the code which skip on the outer
{} braces in this way before, although I can't see anything in the coding
standards which states that this is wrong. In either case, perhaps it's
better to just use an && instead of the extra if (). The assignment line
also exceeds 80 chars.

+/*
+ * Reform index tuple. Truncate nonkey (INCLUDED) attributes.
+ */

I guess "INCLUDED" should be "INCLUDING"? The capitalisation makes me think
you're talking about the syntax.

+ if (!colno || colno == keyno + 1) {
appendStringInfoString(&buf, quote_identifier(attname));
+ if ((attrsOnly)&&(keyno >= idxrec->indnkeyatts))
+ appendStringInfoString(&buf, " (included)");
+ }

The { brace here should be on the next line. I'm also a bit unsure what the
"(included)" is for. There's also surplus parenthesis in the 2nd "if"
statement, and also missing whitespace.

+ bool amcanincluding; /* does AM support INCLUDING columns? */

Perhaps this should be called "amcaninclude". I don't think we really need
to use the same word as is used in the SQL syntax here, do we?
Same for the new column in pg_am.

Perhaps this needs the comment updated from the standard one.

int16 indnatts; /* number of columns in index */

maybe just say /* total number of columns in index */ ?

+ int ii_NumIndexKeyAttrs;

The struct comment needs an entry for ii_NumIndexKeyAttrs.

+ List *indexIncludingParams; /* additional columns to index: a list of
IndexElem */

This should wrap at 80 chars. struct RestrictInfo has some examples of how
this is normally done.

/*
+ * RelationGetNumberOfAttributes
+ * Returns the number of attributes in a relation.
+ */
+#define IndexRelationGetNumberOfKeyAttributes(relation)
((relation)->rd_index->indnkeyatts)
+

Copy paste problem. You missed editing the comment.

I've not tested the patch yet. I will send another email soon with the
results of that.

Thanks for working on this.

[1]
http://www.postgresql.org/docs/devel/static/source-conventions.html#AEN111267

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-01-04 08:58:45 Re: POC: Cache data in GetSnapshotData()
Previous Message Andres Freund 2016-01-04 08:45:09 Re: Avoiding pin scan during btree vacuum