Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Subject: Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date: 2016-08-09 16:45:47
Message-ID: CAJEAwVEVq9Ry7KxApHbFTB67E0ntn2EBTQiUXBG=q5E4P7w1VQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is new version of the patch, now it includes recommendations from
Anastasia Lubennikova.

I've investigated anamalous index size decrease. Most probable version
appeared to be true.
Cube extension, as some others, use Guttman's polynomial time split
algorithm. It is known to generate "needle-like" MBBs (MBRs) for
sorted data due to imbalanced splits (splitting 100 tuples as 98 to
2). Due to previous throw-to-the-end behavior of GiST this imbalance
was further amplificated (most of inserts were going to bigger part
after split). So GiST inserts were extremely slow for sorted data.
There is no need to do exact sorting to trigger this behavior.
This behavior can be fused by implementation of small-m restriction in
split (AFAIR this is described in original R-tree paper from 84),
which prescribes to do a split in a parts no smaller than m, where m
is around 20% of a page capacity (in tuples number). After application
of this patch performance for sorted data is roughly the same as
performance for randomized data.

If data is randomized this effect of Guttman poly-time split is not in
effect; test from the start of the thread will show no statistically
confident improvement by the patch.
To prove performance increase in randomized case I've composed
modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
This test with five runs shows following:
Before patch
Insert Time AVG 78 seconds STD 9.5
Afer patch
Insert Time AVG 68 seconds STD 7.7
This is marginal but statistically viable performance improvement.

For sorted data performance is improved by a factor of 3.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachment Content-Type Size
PageIndexTupleOverwrite v7.patch application/octet-stream 8.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-08-09 17:03:33 Re: patch: xmltable - proof concept
Previous Message Andrew Gierth 2016-08-09 16:32:51 Re: No longer possible to query catalogs for index capabilities?