Re: GiST penalty functions [PoC]

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: amborodin(at)acm(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: GiST penalty functions [PoC]
Date: 2016-09-06 16:57:28
Message-ID: 236d7d60-4415-f4c9-34d7-656a8314d1bd@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/29/2016 09:04 AM, Andrew Borodin wrote:
> In this message I address only first problem. I want to construct a
> penalty function that will choose:
> 1. Entry with a zero volume and smallest margin, that can
> accommodate insertion tuple without extension, if one exists;
> 2. Entry with smallest volume, that can accommodate insertion tuple
> without extension, if one exists;
> 3. Entry with zero volume increase after insert and smallest margin
> increase, if one exists;
> 4. Entry with minimal volume increase after insert.

Looking at the code, by "margin", you mean the sum of all "edges", i.e.
of each dimension, of the cube. I guess the point of that is to
differentiate between cubes that have the same volume, but a different
shape.

So, let's try to come up with a scheme that doesn't require IEEE 754
floats. Those above cases can be split into two categories, depending on
whether the new value has zero volume or not. We can use a completely
different scheme for the two cases, if we want, because when we're
choosing a target page to insert to, penalty gets called for every
original tuple, but with the same new tuple.

Here's a scheme I just came up with. There might be better ones, but
it's something.

Let's have:

oX: Original tuple's edge sum
nX: New tuple's edge sum
dX: Edge increase

oV: Original tuple's volume
nX: New tuple's volume
dX: Volume increase

Case A: New entry has zero volume.
------

Two sub-cases:
A1: if dE > 0, use dE. dE must be in the range [0, nE]
A2: otherwise, use oE.

So how do we cram those two into a single float?

If we offset case A1 by n, we can use the range between [0, nE] for A2.
Something like this pseudocode:

if (dE > 0)
return nE + dE; /* A1, offset dE into range [nE, inf] */
else
return nE - (nE/oE); /* A2, scale oE into range [0, nE] */

Case B: New entry has non-zero volume
------
B1: if dV > 0. use dV. dV must be in the range [0, nV].
B2: if dE > 0, use dE. dE must be in the range [0, nE].
B3: oV, otherwise.

By offsetting cases B1 and B2, we can again divide the range into three
parts, with pseudocode like:

if (dV > 0)
return nV + nE + dV; /* B1, offset dV into range [nV+nE, inf] */
else if (dE > 0)
return nV + dE; /* B2, offset dE into range [nV, nV+nE] */
else
return nV - (nV/oV) /* B3, scale oV into range [0, nV]

> I’ve tested patch performance with attached test. On my machine patch
> slows index construction time from 60 to 76 seconds, reduces size of
> the index from 85Mb to 82Mb, reduces time of aggregates computation
> from 36 seconds to 29 seconds.

Hmm. That's a pretty large increase in construction time. Have you done
any profiling on where the time goes?

> I do not know: should I continue this work in cube, or it’d be better
> to fork cube?

Should definitely continue work within cube, IMHO. I don't think forking
it to a new datatype would make this any easier.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2016-09-06 16:58:25 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Anastasia Lubennikova 2016-09-06 16:48:31 Re: WIP: Covering + unique indexes.