Re: Surjective functional indexes

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Surjective functional indexes
Date: 2017-05-25 17:10:46
Message-ID: 9f8b34b3-0db6-0887-256e-7e5ca8b4b047@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.05.2017 19:37, Tom Lane wrote:
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>> My proposal is to check value of function for functional indexes instead
>> of just comparing set of effected attributes.
>> Obviously, for some complex functions it may have negative effect on
>> update speed.
>> This is why I have added "surjective" option to index.
> This seems overcomplicated. We would have to compute the function
> value at some point anyway. Can't we refactor to do that earlier?
>
> regards, tom lane

Check for affected indexes/applicability of HOT update and update of
indexes themselves is done in two completely different parts of code.
And if we find out that values of indexed expressions are not changed,
then we can use HOT update and indexes should not be updated
(so calculated value of function is not needed). And it is expected to
be most frequent case.

Certainly, if value of indexed expression is changed, then we can avoid
redundant calculation of function by storing result of calculations
somewhere.
But it will greatly complicate all logic of updating indexes. Please
notice, that if we have several functional indexes and only one of them
is actually changed,
then in any case we can not use HOT and have to update all indexes. So
we do not need to evaluate values of all indexed expressions. We just
need to find first
changed one. So we should somehow keep track values of which expression
are calculated and which not.

One more argument. Originally Postgres evaluates index expression only
once (when inserting new version of tuple to the index).
Now (with this patch) Postgres has to evaluate expression three times in
the worst case: calculate the value of expression for old and new tuples
to make a decision bout hot update,
and the evaluate it once again when performing index update itself. Even
if I managed to store somewhere calculated value of the expression, we
still have to perform
twice more evaluations than before. This is why for expensive functions
or for functions defined for frequently updated attributes (in case of
JSON) such policy should be disabled.
And for non-expensive functions extra overhead is negligible. Also there
is completely no overhead if indexed expression is not actually changed.
And it is expected to be most frequent case.

At least at the particular example with YCSB benchmark, our first try
was just to disable index update by commenting correspondent check of
updated fields mask.
Obviously there are no extra function calculations in this case. Then I
have implemented this patch. And performance is almost the same.
This is why I think that simplicity and modularity of code is more
important here than elimination of redundant function calculation.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-05-25 17:13:58 Re: CREATE STATISTICS statistic_type documentation
Previous Message Andres Freund 2017-05-25 17:07:15 Re: Surjective functional indexes