Re: WIP: Covering + unique indexes.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Subject: Re: WIP: Covering + unique indexes.
Date: 2017-03-31 17:40:59
Message-ID: a976fff4-29a9-c799-5c2d-19a371031325@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

30.03.2017 22:11, Andres Freund
> Any objection from reviewers to push both patches?

First of all, I want to thank you and Robert for reviewing this patch.
Your expertise in postgres subsystems is really necessary for features
like this.
I just wonder, why don't you share your thoughts and doubts till the
"last call".

>> diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
>> index f2eda67..59029b9 100644
>> --- a/contrib/bloom/blutils.c
>> +++ b/contrib/bloom/blutils.c
>> @@ -120,6 +120,7 @@ blhandler(PG_FUNCTION_ARGS)
>> amroutine->amclusterable = false;
>> amroutine->ampredlocks = false;
>> amroutine->amcanparallel = false;
>> + amroutine->amcaninclude = false;
> That name doesn't strike me as very descriptive.
>
The feature is "index with included columns", it uses keyword "INCLUDE".
So the name looks good to me.
Any suggestions?
>> + <term><literal>INCLUDE</literal></term>
>> + <listitem>
>> + <para>
>> + An optional <literal>INCLUDE</> clause allows a list of columns to be
>> + specified which will be included in the non-key portion of the index.
>> + Columns which are part of this clause cannot also exist in the
>> + key columns portion of the index, and vice versa. The
>> + <literal>INCLUDE</> columns exist solely to allow more queries to benefit
>> + from <firstterm>index-only scans</> by including certain columns in the
>> + index, the value of which would otherwise have to be obtained by reading
>> + the table's heap. Having these columns in the <literal>INCLUDE</> clause
>> + in some cases allows <productname>PostgreSQL</> to skip the heap read
>> + completely. This also allows <literal>UNIQUE</> indexes to be defined on
>> + one set of columns, which can include another set of columns in the
>> + <literal>INCLUDE</> clause, on which the uniqueness is not enforced.
>> + It's the same with other constraints (PRIMARY KEY and EXCLUDE). This can
>> + also can be used for non-unique indexes as any columns which are not required
>> + for the searching or ordering of records can be used in the
>> + <literal>INCLUDE</> clause, which can slightly reduce the size of the index.
>> + Currently, only the B-tree access method supports this feature.
>> + Expressions as included columns are not supported since they cannot be used
>> + in index-only scans.
>> + </para>
>> + </listitem>
>> + </varlistentry>
> This could use some polishing.
Definitely. But do you have any specific proposals?

>> +/*
>> + * Reform index tuple. Truncate nonkey (INCLUDE) attributes.
>> + */
>> +IndexTuple
>> +index_truncate_tuple(Relation idxrel, IndexTuple olditup)
>> +{
>> + TupleDesc itupdesc = RelationGetDescr(idxrel);
>> + Datum values[INDEX_MAX_KEYS];
>> + bool isnull[INDEX_MAX_KEYS];
>> + IndexTuple newitup;
>> + int indnatts = IndexRelationGetNumberOfAttributes(idxrel);
>> + int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(idxrel);
>> +
>> + Assert(indnatts <= INDEX_MAX_KEYS);
>> + Assert(indnkeyatts > 0);
>> + Assert(indnkeyatts < indnatts);
>> +
>> + index_deform_tuple(olditup, itupdesc, values, isnull);
>> +
>> + /* form new tuple that will contain only key attributes */
>> + itupdesc->natts = indnkeyatts;
>> + newitup = index_form_tuple(itupdesc, values, isnull);
>> + newitup->t_tid = olditup->t_tid;
>> +
>> + itupdesc->natts = indnatts;
> Uh, isn't this a *seriously* bad idea? If index_form_tuple errors out,
> this'll corrupt the tuple descriptor.
Initial reasoning was something like this:

> Maybe it would be better to modify index_form_tuple() to accept a new
> argument with a number of attributes, then you can just Assert that
> this number is never higher than the number of attributes in the
> TupleDesc.
Good point.
I agree that this function is a bit strange. I have to set
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new
parameter to index_form_tuple(), because they are used widely.

But I haven't considered the possibility of index_form_tuple() failure.
Fixed in this version of the patch. Now it creates a copy of tupledesc
to pass it to index_form_tuple.

> Maybe also rename the function to index_build_key_tuple()?
We'd discussed with other reviewers, they suggested
index_truncate_tuple() instead of index_reform_tuple().
I think that this name reflects the essence of the function clear enough
and don't feel like renaming it again.

>> @@ -368,7 +370,7 @@ systable_beginscan(Relation heapRelation,
>> {
>> int j;
>>
>> - for (j = 0; j < irel->rd_index->indnatts; j++)
>> + for (j = 0; j < IndexRelationGetNumberOfAttributes(irel); j++)
>> {
>> if (key[i].sk_attno == irel->rd_index->indkey.values[j])
>> {
>> @@ -376,7 +378,7 @@ systable_beginscan(Relation heapRelation,
>> break;
>> }
>> }
>> - if (j == irel->rd_index->indnatts)
>> + if (j == IndexRelationGetNumberOfAttributes(irel))
>> elog(ERROR, "column is not in index");
>> }
> Not that it matters overly much, but why are we doing this for all
> attributes, rather than just key attributes?
Since we don't use included columns for system indexes, there is no
difference. I've just tried to minimize code changes here.
>> --- a/src/backend/bootstrap/bootstrap.c
>> +++ b/src/backend/bootstrap/bootstrap.c
>> @@ -600,7 +600,7 @@ boot_openrel(char *relname)
>> relname, (int) ATTRIBUTE_FIXED_PART_SIZE);
>>
>> boot_reldesc = heap_openrv(makeRangeVar(NULL, relname, -1), NoLock);
>> - numattr = boot_reldesc->rd_rel->relnatts;
>> + numattr = RelationGetNumberOfAttributes(boot_reldesc);
>> for (i = 0; i < numattr; i++)
>> {
>> if (attrtypes[i] == NULL)
> That seems a bit unrelated.
I've replaced all the references to relnatts with macro, primarily to
ensure that I won't miss anything that should use only key attributes.
>> @@ -2086,7 +2086,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr,
>> is_validated,
>> RelationGetRelid(rel), /* relation */
>> attNos, /* attrs in the constraint */
>> - keycount, /* # attrs in the constraint */
>> + keycount, /* # key attrs in the constraint */
>> + keycount, /* # total attrs in the constraint */
>> InvalidOid, /* not a domain constraint */
>> InvalidOid, /* no associated index */
>> InvalidOid, /* Foreign key fields */
> It doesn't quite seem right to me to store this both in pg_index and
> pg_constraint.
>
Initially, I did to provide pg_get_constraintdef_worker() with info
about included columns.
Maybe it can be solved in some other way, but for now it is a tested and
working implementation.

>>
>> +opt_c_include: INCLUDE '(' columnList ')' { $$ = $3; }
>> + | /* EMPTY */ { $$ = NIL; }
>> + ;
>> +opt_include: INCLUDE '(' index_including_params ')' { $$ = $3; }
>> + | /* EMPTY */ { $$ = NIL; }
>> + ;
>> +
>> +index_including_params: index_elem { $$ = list_make1($1); }
>> + | index_including_params ',' index_elem { $$ = lappend($1, $3); }
>> + ;
>> +
> Why do we have multiple different definitions of this?

Hm,
columnList contains entries of columnElem type and
index_including_params works with index_elem.
Is there a way they can be combined?

>> + keytup = index_truncate_tuple(wstate->index, oitup);
> The code in _bt_split previously claimed that it's the only place doing
> truncation...
To be exact, it claimed that regarding insertion of new values, not
index build.
> It's the only point in insertion process, where we perform truncation

Other comments about code format, spelling and comments are fixed in
attached patches.
Thank you again for reviewing.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-covering-core_v1.patch text/x-diff 102.2 KB
0002-covering-btree_v1.patch text/x-diff 42.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-03-31 17:44:52 Re: Allow to specify #columns in heap/index_form_tuple
Previous Message Tom Lane 2017-03-31 17:40:00 Re: REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)