Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: amborodin(at)acm(dot)org
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Subject: Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date: 2016-08-16 15:23:57
Message-ID: ac2d0b41-84e2-153c-d41a-3456fdeb5785@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

09.08.2016 19:45, Andrew Borodin:
> Here is new version of the patch, now it includes recommendations from
> Anastasia Lubennikova.
>
>
> I've investigated anamalous index size decrease. Most probable version
> appeared to be true.
> Cube extension, as some others, use Guttman's polynomial time split
> algorithm. It is known to generate "needle-like" MBBs (MBRs) for
> sorted data due to imbalanced splits (splitting 100 tuples as 98 to
> 2). Due to previous throw-to-the-end behavior of GiST this imbalance
> was further amplificated (most of inserts were going to bigger part
> after split). So GiST inserts were extremely slow for sorted data.
> There is no need to do exact sorting to trigger this behavior.
> This behavior can be fused by implementation of small-m restriction in
> split (AFAIR this is described in original R-tree paper from 84),
> which prescribes to do a split in a parts no smaller than m, where m
> is around 20% of a page capacity (in tuples number). After application
> of this patch performance for sorted data is roughly the same as
> performance for randomized data.

Thank you for explanation. Now it's clear to me. I did some more testing and
found nothing special. The declared feature is implemented correctly.
It passes all regression tests and improves performance.

I still have a few minor nitpicks about the patch style.
You can find a style guide on
https://www.postgresql.org/docs/9.6/static/source.html

1) remove extra whitespace in README

2) add whitespace: if(ntup == 1)

3) fix comments in accordance with coding conventions

/* In case of single tuple update GiST calls overwrite
* In all other cases function gistplacetopage deletes
* old tuples and place updated at the end
* */

+ /* Normally here was following assertion
+ * Assert(ItemIdHasStorage(ii));
+ * This assertion was introduced in PageIndexTupleDelete
+ * But if this function will be used from BRIN index
+ * this assertion will fail. Thus, here we do not check that
+ * item has the storage.
+ * */

4) remove unrelated changes
- data += sizeof(OffsetNumber) * xldata->ntodelete;
+ data += sizeof(OffsetNumber) *xldata->ntodelete;

5) If the comment is correct, maxalignment is not necessary.
+ /* tuples on a page are always maxaligned */
+ oldsize = MAXALIGN(oldsize);

6) I'd rather use alignednewsize here.
+ ItemIdSetNormal(tupid, offset + size_diff, newsize);

After the cleanup you can change status to "Ready for Committer"
without waiting for the response.

> If data is randomized this effect of Guttman poly-time split is not in
> effect; test from the start of the thread will show no statistically
> confident improvement by the patch.
> To prove performance increase in randomized case I've composed
> modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
> This test with five runs shows following:
> Before patch
> Insert Time AVG 78 seconds STD 9.5
> Afer patch
> Insert Time AVG 68 seconds STD 7.7
> This is marginal but statistically viable performance improvement.
>
> For sorted data performance is improved by a factor of 3.
> Best regards, Andrey Borodin, Octonica & Ural Federal University.
>

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-16 15:24:21 Re: PSA: Systemd will kill PostgreSQL
Previous Message Yury Zhuravlev 2016-08-16 15:22:17 Re: [GENERAL] C++ port of Postgres