Re: Surjective functional indexes

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Christoph Berg <myon(at)debian(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Surjective functional indexes
Date: 2017-09-14 15:37:10
Message-ID: bd2b12b9-ed2a-75d3-c501-128f78831a61@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.09.2017 13:19, Simon Riggs wrote:
> On 14 September 2017 at 10:42, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>
>> On 13.09.2017 14:00, Simon Riggs wrote:
>>> On 13 September 2017 at 11:30, Konstantin Knizhnik
>>> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>>
>>>> The only reason of all this discussion about terms is that I need to
>>>> choose
>>>> name for correspondent index option.
>>>> Simon think that we do not need this option at all. In this case we
>>>> should
>>>> not worry about right term.
>>>> From my point of view, "projection" is quite clear notion and not only
>>>> for
>>>> mathematics. It is also widely used in IT and especially in DBMSes.
>>> If we do have an option it won't be using fancy mathematical
>>> terminology at all, it would be described in terms of its function,
>>> e.g. recheck_on_update
>>>
>>> Yes, I'd rather not have an option at all, just some simple code with
>>> useful effect, like we have in many other places.
>>>
>> Attached please find new version of projection functional index optimization
>> patch.
>> I have implemented very simple autotune strategy: now I use table statistic
>> to compare total number of updates with number of hot updates.
>> If fraction of hot updates is relatively small, then there is no sense to
>> spend time performing extra evaluation of index expression and comparing its
>> old and new values.
>> Right now the formula is the following:
>>
>> #define MIN_UPDATES_THRESHOLD 10
>> #define HOT_RATIO_THRESHOLD 2
>>
>> if (stat->tuples_updated > MIN_UPDATES_THRESHOLD
>> && stat->tuples_updated >
>> stat->tuples_hot_updated*HOT_RATIO_THRESHOLD)
>> {
>> /* If percent of hot updates is small, then disable projection
>> index function
>> * optimization to eliminate overhead of extra index expression
>> evaluations.
>> */
>> ii->ii_Projection = false;
>> }
>>
>> This threshold values are pulled out of a hat: I am not sure if this
>> heuristic is right.
>> I will be please to get feedback if such approach to autotune is promising.
> Hmm, not really, but thanks for trying.
>
> This works by looking at overall stats, and only looks at the overall
> HOT %, so its too heavyweight and coarse.
>
> I suggested storing stat info on the relcache and was expecting you
> would look at how often the expression evaluates to new == old. If we
> evaluate new against old many times, then if the success rate is low
> we should stop attempting the comparison. (<10%?)
>
> Another idea:
> If we don't make a check when we should have done then we will get a
> non-HOT update, so we waste time extra time difference between a HOT
> and non-HOT update. If we check and fail we waste time take to perform
> check. So the question is how expensive the check is against how
> expensive a non-HOT update is. Could we simply say we don't bother to
> check functions that have a cost higher than 10000? So if the user
> doesn't want to perform the check they can just increase the cost of
> the function above the check threshold?
>
Attached pleased find one more patch which calculates hot update check
hit rate more precisely: I have to extended PgStat_StatTabEntry with two
new fields:
hot_update_hits and hot_update_misses.

Concerning your idea to check cost of index function: it certainly makes
sense.
The only problems: I do not understand now how to calculate this cost.
It can be easily calculated by optimizer when it is building query
execution plan.
But inside BuildIndexInfo I have just reference to Relation and have no
idea how
I can propagate here information about index expression cost from optimizer.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
projection-autotune2.patch text/x-patch 21.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2017-09-14 15:39:26 Re: [POC] hash partitioning
Previous Message Robert Haas 2017-09-14 15:28:47 Re: Is it time to kill support for very old servers?