Re: Surjective functional indexes

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Christoph Berg <myon(at)debian(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Surjective functional indexes
Date: 2017-09-15 08:16:05
Message-ID: 251f2f30-7e38-f417-5eaf-fd9f11d00932@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.09.2017 18:53, Simon Riggs wrote:
>
>>> This works by looking at overall stats, and only looks at the overall
>>> HOT %, so its too heavyweight and coarse.
>>>
>>> I suggested storing stat info on the relcache and was expecting you
>>> would look at how often the expression evaluates to new == old. If we
>>> evaluate new against old many times, then if the success rate is low
>>> we should stop attempting the comparison. (<10%?)
>>>
>>> Another idea:
>>> If we don't make a check when we should have done then we will get a
>>> non-HOT update, so we waste time extra time difference between a HOT
>>> and non-HOT update. If we check and fail we waste time take to perform
>>> check. So the question is how expensive the check is against how
>>> expensive a non-HOT update is. Could we simply say we don't bother to
>>> check functions that have a cost higher than 10000? So if the user
>>> doesn't want to perform the check they can just increase the cost of
>>> the function above the check threshold?
>>>
>> Attached pleased find one more patch which calculates hot update check hit
>> rate more precisely: I have to extended PgStat_StatTabEntry with two new
>> fields:
>> hot_update_hits and hot_update_misses.
> It's not going to work, as already mentioned above. Those stats are at
> table level and very little to do with this particular index.
>
> But you've not commented on the design I mention that can work: index relcache.
Sorry, I do not completely agree with you.
Yes, certainly whether functional index is projective or not is property
of the index, not of the table.
But the decision whether hot update is applicable or not is made for the
whole table - for all indexes.
If a value of just one indexed expressions is changed then we can not
use hot update and have to update all indexes.

Assume that we have table with "bookinfo" field of type JSONB.
And we create several functional indexes on this column:
(bookinfo->'isbn'), (bookinfo->'title'), (bookinfo->'author'),
(bookinfo->'rating').
Probability that indexed expression is changed is case of updating
"bookinfo" field my be different for all this three indexes.
But there is completely no sense to check if 'isbn' is changed or not,
if we already detect that most updates cause change of 'rating'
attribute and
so comparing old and new values of (bookinfo->'rating') is just waste of
time. In this case we should not also compare (bookinfo->'isbn') and
other indexed expressions because for already rejected possibility of
hot update.

So despite to the fact that this information depends on particular
index, it affects behavior of the whole table and it is reasonable (and
simpler) to collect it in table's statistic.

>> Concerning your idea to check cost of index function: it certainly makes
>> sense.
>> The only problems: I do not understand now how to calculate this cost.
>> It can be easily calculated by optimizer when it is building query execution
>> plan.
>> But inside BuildIndexInfo I have just reference to Relation and have no idea
>> how
>> I can propagate here information about index expression cost from optimizer.
> We could copy at create index, if we took that route. Or we can look
> up the cost for the index expression and cache it.
>
>
> Anyway, this is just jumping around because we still have a parameter
> and the idea was to remove the parameter entirely by autotuning, which
> I think is both useful and possible, just as HOT itself is autotuned.
>

Hot update in almost all cases is preferable to normal update, causing
update of indexes.
There are can be some scenarios when hot updates reduce speed of some
queries,
but it is very difficult to predict such cases user level.

But usually nature of index is well known by DBA or programmer. In
almost all cases it is clear for person creating functional index
whether it will perform projection or not
and whether comparing old/new expression value makes sense or is just
waste of time. We can guess it from autotune, but such decision may be
wrong (just because of application
business logic). Postgres indexes already have a lot of options. And I
think that "projection" option (or whatever we name it) is also needed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-09-15 08:23:28 Re: [PATCH] Improve geometric types
Previous Message Alexander Korotkov 2017-09-15 08:10:38 Re: COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS