Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Subject: Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date: 2016-08-18 08:05:44
Message-ID: CAJEAwVEtba7PBtez2kSRFF1yj+dcWx-HsV2P+A-NbFyNTxLYTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your corrections.
Here is the patch with suggestions taken into account, except 6th.
>6) I'd rather use alignednewsize here.
> + ItemIdSetNormal(tupid, offset + size_diff, newsize);
This behavior is accroding to ubiquitous PageAddItem.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-08-16 20:23 GMT+05:00 Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>:
> 09.08.2016 19:45, Andrew Borodin:
>>
>> Here is new version of the patch, now it includes recommendations from
>> Anastasia Lubennikova.
>>
>>
>> I've investigated anamalous index size decrease. Most probable version
>> appeared to be true.
>> Cube extension, as some others, use Guttman's polynomial time split
>> algorithm. It is known to generate "needle-like" MBBs (MBRs) for
>> sorted data due to imbalanced splits (splitting 100 tuples as 98 to
>> 2). Due to previous throw-to-the-end behavior of GiST this imbalance
>> was further amplificated (most of inserts were going to bigger part
>> after split). So GiST inserts were extremely slow for sorted data.
>> There is no need to do exact sorting to trigger this behavior.
>> This behavior can be fused by implementation of small-m restriction in
>> split (AFAIR this is described in original R-tree paper from 84),
>> which prescribes to do a split in a parts no smaller than m, where m
>> is around 20% of a page capacity (in tuples number). After application
>> of this patch performance for sorted data is roughly the same as
>> performance for randomized data.
>
>
> Thank you for explanation. Now it's clear to me. I did some more testing and
> found nothing special. The declared feature is implemented correctly.
> It passes all regression tests and improves performance.
>
> I still have a few minor nitpicks about the patch style.
> You can find a style guide on
> https://www.postgresql.org/docs/9.6/static/source.html
>
> 1) remove extra whitespace in README
>
> 2) add whitespace: if(ntup == 1)
>
> 3) fix comments in accordance with coding conventions
>
> /* In case of single tuple update GiST calls overwrite
> * In all other cases function gistplacetopage deletes
> * old tuples and place updated at the end
> * */
>
>
> + /* Normally here was following assertion
> + * Assert(ItemIdHasStorage(ii));
> + * This assertion was introduced in PageIndexTupleDelete
> + * But if this function will be used from BRIN index
> + * this assertion will fail. Thus, here we do not check that
> + * item has the storage.
> + * */
>
> 4) remove unrelated changes
> - data += sizeof(OffsetNumber) * xldata->ntodelete;
> + data += sizeof(OffsetNumber) *xldata->ntodelete;
>
> 5) If the comment is correct, maxalignment is not necessary.
> + /* tuples on a page are always maxaligned */
> + oldsize = MAXALIGN(oldsize);
>
> 6) I'd rather use alignednewsize here.
> + ItemIdSetNormal(tupid, offset + size_diff, newsize);
>
>
> After the cleanup you can change status to "Ready for Committer"
> without waiting for the response.
>
>> If data is randomized this effect of Guttman poly-time split is not in
>> effect; test from the start of the thread will show no statistically
>> confident improvement by the patch.
>> To prove performance increase in randomized case I've composed
>> modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
>> This test with five runs shows following:
>> Before patch
>> Insert Time AVG 78 seconds STD 9.5
>> Afer patch
>> Insert Time AVG 68 seconds STD 7.7
>> This is marginal but statistically viable performance improvement.
>>
>> For sorted data performance is improved by a factor of 3.
>> Best regards, Andrey Borodin, Octonica & Ural Federal University.
>>
>
> --
> Anastasia Lubennikova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Attachment Content-Type Size
PageIndexTupleOverwrite v8.patch application/octet-stream 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2016-08-18 09:12:06 Re: Missing checks when malloc returns NULL...
Previous Message Simon Riggs 2016-08-18 07:58:11 Re: Pluggable storage