Re: WIP: Covering + unique indexes.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Subject: Re: WIP: Covering + unique indexes.
Date: 2017-03-30 19:11:38
Message-ID: 20170330191138.zawsxgsxnt2qy2as@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-03-30 18:26:05 +0300, Teodor Sigaev wrote:
> Any objection from reviewers to push both patches?

> diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
> index f2eda67..59029b9 100644
> --- a/contrib/bloom/blutils.c
> +++ b/contrib/bloom/blutils.c
> @@ -120,6 +120,7 @@ blhandler(PG_FUNCTION_ARGS)
> amroutine->amclusterable = false;
> amroutine->ampredlocks = false;
> amroutine->amcanparallel = false;
> + amroutine->amcaninclude = false;

That name doesn't strike me as very descriptive.

> + <term><literal>INCLUDE</literal></term>
> + <listitem>
> + <para>
> + An optional <literal>INCLUDE</> clause allows a list of columns to be
> + specified which will be included in the non-key portion of the index.
> + Columns which are part of this clause cannot also exist in the
> + key columns portion of the index, and vice versa. The
> + <literal>INCLUDE</> columns exist solely to allow more queries to benefit
> + from <firstterm>index-only scans</> by including certain columns in the
> + index, the value of which would otherwise have to be obtained by reading
> + the table's heap. Having these columns in the <literal>INCLUDE</> clause
> + in some cases allows <productname>PostgreSQL</> to skip the heap read
> + completely. This also allows <literal>UNIQUE</> indexes to be defined on
> + one set of columns, which can include another set of columns in the
> + <literal>INCLUDE</> clause, on which the uniqueness is not enforced.
> + It's the same with other constraints (PRIMARY KEY and EXCLUDE). This can
> + also can be used for non-unique indexes as any columns which are not required
> + for the searching or ordering of records can be used in the
> + <literal>INCLUDE</> clause, which can slightly reduce the size of the index.
> + Currently, only the B-tree access method supports this feature.
> + Expressions as included columns are not supported since they cannot be used
> + in index-only scans.
> + </para>
> + </listitem>
> + </varlistentry>

This could use some polishing.

> +/*
> + * Reform index tuple. Truncate nonkey (INCLUDE) attributes.
> + */
> +IndexTuple
> +index_truncate_tuple(Relation idxrel, IndexTuple olditup)
> +{
> + TupleDesc itupdesc = RelationGetDescr(idxrel);
> + Datum values[INDEX_MAX_KEYS];
> + bool isnull[INDEX_MAX_KEYS];
> + IndexTuple newitup;
> + int indnatts = IndexRelationGetNumberOfAttributes(idxrel);
> + int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(idxrel);
> +
> + Assert(indnatts <= INDEX_MAX_KEYS);
> + Assert(indnkeyatts > 0);
> + Assert(indnkeyatts < indnatts);
> +
> + index_deform_tuple(olditup, itupdesc, values, isnull);
> +
> + /* form new tuple that will contain only key attributes */
> + itupdesc->natts = indnkeyatts;
> + newitup = index_form_tuple(itupdesc, values, isnull);
> + newitup->t_tid = olditup->t_tid;
> +
> + itupdesc->natts = indnatts;

Uh, isn't this a *seriously* bad idea? If index_form_tuple errors out,
this'll corrupt the tuple descriptor.

Maybe also rename the function to index_build_key_tuple()?

> * Construct a string describing the contents of an index entry, in the
> * form "(key_name, ...)=(key_value, ...)". This is currently used
> - * for building unique-constraint and exclusion-constraint error messages.
> + * for building unique-constraint and exclusion-constraint error messages,
> + * so only key columns of index are checked and printed.

s/index/the index/

> @@ -368,7 +370,7 @@ systable_beginscan(Relation heapRelation,
> {
> int j;
>
> - for (j = 0; j < irel->rd_index->indnatts; j++)
> + for (j = 0; j < IndexRelationGetNumberOfAttributes(irel); j++)

> {
> if (key[i].sk_attno == irel->rd_index->indkey.values[j])
> {
> @@ -376,7 +378,7 @@ systable_beginscan(Relation heapRelation,
> break;
> }
> }
> - if (j == irel->rd_index->indnatts)
> + if (j == IndexRelationGetNumberOfAttributes(irel))
> elog(ERROR, "column is not in index");
> }

Not that it matters overly much, but why are we doing this for all
attributes, rather than just key attributes?

> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -600,7 +600,7 @@ boot_openrel(char *relname)
> relname, (int) ATTRIBUTE_FIXED_PART_SIZE);
>
> boot_reldesc = heap_openrv(makeRangeVar(NULL, relname, -1), NoLock);
> - numattr = boot_reldesc->rd_rel->relnatts;
> + numattr = RelationGetNumberOfAttributes(boot_reldesc);
> for (i = 0; i < numattr; i++)
> {
> if (attrtypes[i] == NULL)

That seems a bit unrelated.

> @@ -2086,7 +2086,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr,
> is_validated,
> RelationGetRelid(rel), /* relation */
> attNos, /* attrs in the constraint */
> - keycount, /* # attrs in the constraint */
> + keycount, /* # key attrs in the constraint */
> + keycount, /* # total attrs in the constraint */
> InvalidOid, /* not a domain constraint */
> InvalidOid, /* no associated index */
> InvalidOid, /* Foreign key fields */

It doesn't quite seem right to me to store this both in pg_index and
pg_constraint.

> @@ -340,14 +341,27 @@ DefineIndex(Oid relationId,
> numberOfAttributes = list_length(stmt->indexParams);
> - if (numberOfAttributes <= 0)
> - ereport(ERROR,
> - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> - errmsg("must specify at least one column")));
> +

Huh, why's that check gone?

>
> +opt_c_include: INCLUDE '(' columnList ')' { $$ = $3; }
> + | /* EMPTY */ { $$ = NIL; }
> + ;

> +opt_include: INCLUDE '(' index_including_params ')' { $$ = $3; }
> + | /* EMPTY */ { $$ = NIL; }
> + ;
> +
> +index_including_params: index_elem { $$ = list_make1($1); }
> + | index_including_params ',' index_elem { $$ = lappend($1, $3); }
> + ;
> +

Why do we have multiple different definitions of this?

> @@ -1979,6 +2017,48 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
> index->indexParams = lappend(index->indexParams, iparam);
> }
>
> + /* Here is some code duplication. But we do need it. */

Aha?

> + foreach(lc, constraint->including)
> + {
> + char *key = strVal(lfirst(lc));
> + bool found = false;
> + ColumnDef *column = NULL;
> + ListCell *columns;
> + IndexElem *iparam;
> +
> + foreach(columns, cxt->columns)
> + {
> + column = (ColumnDef *) lfirst(columns);
> + Assert(IsA(column, ColumnDef));
> + if (strcmp(column->colname, key) == 0)
> + {
> + found = true;
> + break;
> + }
> + }
> +
> + /*
> + * In the ALTER TABLE case, don't complain about index keys not
> + * created in the command; they may well exist already. DefineIndex
> + * will complain about them if not, and will also take care of marking
> + * them NOT NULL.
> + */

Uh. Why should they be marked as NOT NULL? ISTM the comment has been
copied here without adjustments.

> @@ -1275,6 +1275,21 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
> Oid keycoltype;
> Oid keycolcollation;
>
> + /*
> + * attrsOnly flag is used for building unique-constraint and
> + * exclusion-constraint error messages. Included attrs are
> + * meaningless there, so do not include them in the message.
> + */
> + if (attrsOnly && keyno >= idxrec->indnkeyatts)
> + break;

Sounds like the parameter should be renamed then.

> +Included attributes in B-tree indexes
> +-------------------------------------
> +
> +Since 10.0 there is an optional INCLUDE clause, that allows to add

10.0 isn't right, since that's the "patch" version now.

> +a portion of non-key attributes to index. They exist to allow more queries
> +to benefit from index-only scans. We never use included attributes in
> +ScanKeys, neither for search nor for inserts. That allows us to include
> +into B-tree any datatypes, even those which don't have suitable opclass.
> +Included columns only stored in regular items on leaf pages. All inner
> +keys and high keys are truncated and contain only key attributes.
> +That helps to reduce the size of index.

s/index/the index/

> @@ -537,6 +542,28 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
> ItemIdSetUnused(ii); /* redundant */
> ((PageHeader) opage)->pd_lower -= sizeof(ItemIdData);
>
> + if (indnkeyatts != indnatts && P_ISLEAF(opageop))
> + {
> + /*
> + * It's essential to truncate High key here.
> + * The purpose is not just to save more space on this particular page,
> + * but to keep whole b-tree structure consistent. Subsequent insertions
> + * assume that hikey is already truncated, and so they should not
> + * worry about it, when copying the high key into the parent page
> + * as a downlink.

s/should/need/

> + * NOTE It is not crutial for reliability in present,

s/crutial/crucial/

> + * but maybe it will be that in the future.
> + */

"it's essential" ... "it is not crutial" -- that's contradictory.

> + keytup = index_truncate_tuple(wstate->index, oitup);

The code in _bt_split previously claimed that it's the only place doing
truncation...

> + /* delete "wrong" high key, insert keytup as P_HIKEY. */
> + PageIndexTupleDelete(opage, P_HIKEY);

> + if (!_bt_pgaddtup(opage, IndexTupleSize(keytup), keytup, P_HIKEY))
> + elog(ERROR, "failed to rewrite compressed item in index \"%s\"",
> + RelationGetRelationName(wstate->index));

Hm...

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2017-03-30 19:27:06 Re: WIP: Covering + unique indexes.
Previous Message Thomas Munro 2017-03-30 19:02:35 Re: strange parallel query behavior after OOM crashes