Re: Surjective functional indexes

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Christoph Berg <myon(at)debian(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Surjective functional indexes
Date: 2017-09-28 20:37:40
Message-ID: 59CD5D94.3080501@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/28/2017 10:10 PM, Robert Haas wrote:
> On Wed, Sep 13, 2017 at 7:00 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> If we do have an option it won't be using fancy mathematical
>> terminology at all, it would be described in terms of its function,
>> e.g. recheck_on_update
> +1.

I have nothing against renaming "projection" option to "recheck_on_update" or whatever else is suggested.
Just let me know the best version. From my point of view "recheck_on_update" is too verbose and still not self-explained (to understand the meaning of this option it is necessary to uunderstand how heap_update works). "projection"/"non-injective"/... are
more declarative notions, explaining the characteristic of the index, while "recheck_on_update" is more procedural notion, explaining behavior of heap_update.

>
>> Yes, I'd rather not have an option at all, just some simple code with
>> useful effect, like we have in many other places.
> I think the question we need to be able to answer is: What is the
> probability that an update that would otherwise be non-HOT can be made
> into a HOT update by performing a recheck to see whether the value has
> changed? It doesn't seem easy to figure that out from any of the
> statistics we have available today or could easily get, because it
> depends not only on the behavior of the expression which appears in
> the index definition but also on the application behavior. For
> example, consider a JSON blob representing a bank account.
> b->'balance' is likely to change most of the time, but
> b->'account_holder_name' only rarely. That's going to be hard for an
> automated system to determine.
>
> We should clearly check as many of the other criteria for a HOT update
> as possible before performing a recheck of this type, so that it only
> gets performed when it might help. For example, if column a is
> indexed and b->'foo' is indexed, there's no point in checking whether
> b->'foo' has changed if we know that a has changed. I don't know
> whether it would be feasible to postpone deciding whether to do a
> recheck until after we've figured out whether the page seems to
> contain enough free space to allow a HOT update.
>
> Turning non-HOT updates into HOT updates is really good, so it seems
> likely that the rechecks will often be worthwhile. If we avoid a HOT
> update in 25% of cases, that's probably easily worth the CPU overhead
> of a recheck assuming the function isn't something ridiculously
> expensive to compute; the extra CPU cost will be repaid by reduced
> bloat. However, if we avoid a HOT update only one time in a million,
> it's probably not worth the cost of recomputing the expression the
> other 999,999 times. I wonder where the crossover point is -- it
> seems like something that could be figured out by benchmarking.
>
> While I agree that it would be nice to have this be a completely
> automatic determination, I am not sure that will be practical. I
> oppose overloading some other marker (like function_cost>10000) for
> this; that's too magical.
>
I almost agree with you.
Just few remarks: indexes are rarely created for frequently changed attributes, like b->'balance'.
So in case of proper database schema design it is possible to expect that most of updates are hot updates: do not actually affect any index.
But certainly different attributes may have different probability of been updated.
Unfortunately we do not know before check which attribute of JSON field (or any other fields used in indexed expression) is changed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-09-28 21:15:15 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Oleg Bartunov 2017-09-28 20:25:16 Re: Surjective functional indexes