Re: [HACKERS] Surjective functional indexes

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Christoph Berg <myon(at)debian(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Surjective functional indexes
Date: 2018-05-11 11:58:03
Message-ID: 8e3a0102-07c8-f073-ac02-ddaa3e24425d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11.05.2018 07:48, David G. Johnston wrote:
> On Thursday, February 1, 2018, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
>
> Old + New for check = 2
> plus calculate again in index = 3
>
>
> Yes, we have to calculate the value of index expression for
> original and updated version of the record. If them are equal,
> then it is all we have to do with this index: hot update is
> applicable.
> In this case we calculate index expression twice.
> But if them are not equal, then hot update is not applicable and
> we have to update index. In this case expression will be
> calculated one more time. So totally three times.
> This is why, if calculation of index expression is very expensive,
> then effect of this optimization may be negative even if value of
> index expression is not changed.
>
>
> For the old/new comparison and the re-calculate if changed dynamics -
> is this a side effect of separation of concerns only or is there some
> other reason the old computed value already stored in the index isn't
> compared to the one and only function result of the new tuple which,
> if found to be different, is then stored.  One function invocation,
> which has to happen anyway, and one extra equality check.  Seems like
> this could be used for non-functional indexes too, so that mere
> presence in the update listing doesn't negate HOT if the column didn't
> actually change (if I'm not mis-remembering something here anyway...)
>
> Also, create index page doc typo from site:  "with an total" s/b "with
> a total" (expression cost less than 1000) - maybe add a comma for 1,000
>
> David J.
>
>

Sorry, may be I do not completely understand you.
So whats happed before this patch:

- On update postgres compares old and new values of all changed
attributes to determine whether them are actually changed.
- If value of some indexed attribute is changed,  then hot update is not
applicable and we have to rebuild indexed.
- Otherwise hot update is used and indexes should not be updated.

What is changed:

-  When some of attributes, on which functional index depends, is
changed, then we calculate value of index expression. It is done using
existed FormIndexDatum function which stores calculated expression value
in the provided slot. This evaluation of index expressions and their
comparison is done in access/heap/heapam.c file.
- Only if old and new values of index expression are different, then hot
update is really not applicable.
- In this case we have to rebuild indexes. It is done by
ExecInsertIndexTuples in executor/execIndexing.c which calls
FormIndexDatum once again to calculate index expression.

So in principle, it is certainly possible to store value of index
expression calculated in ProjIndexIsUnchanged and reuse it
ExecInsertIndexTuples.
But I do not know right place (context) where this value can be stored.
And also it will add some dependency between  heapam and execIndexing
modules. Also it is necessary to take in account that
ProjIndexIsUnchanged is not always called. So index expression value may
be not present.

Finally, most of practically used index expressions are not expensive.
It is usually something like extraction of JSON component. Cost of
execution of this function is much smaller than cost of extracting and
unpacking toasted JSON value. So I do not think that such optimization
will be really useful, while it is obvious that it significantly
complicate code.
Also please notice that FormIndexDatum is used in some other places:

  utils/adt/selfuncs.c
  utils/sort/tuplesort.c
  mmands/constraint.c

So this patch just adds two more calls of this function.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-05-11 12:26:42 Re: Having query cache in core
Previous Message Hartmut Holzgraefe 2018-05-11 09:52:40 Re: Having query cache in core