Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: amborodin(at)acm(dot)org
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Subject: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date: 2016-07-03 10:21:12
Message-ID: CAPpHfdsKRPVXj44dqc-TMXRgWStPGSV+Yrev8R53JRSoFoatyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Sun, Jul 3, 2016 at 12:24 PM, Andrew Borodin <borodin(at)octonica(dot)com>
wrote:

> I think there is some room for improving GiST inserts. Following is
> the description of what I think is performance problem.
>
> Function gistplacetopage in file /src/backend/access/gist/gist.c is
> responsible for updating or appending new tuple on GiST page.
> Currently, after checking necessity of page split due to overflow, it
> essentially executes following:
>
> if (OffsetNumberIsValid(oldoffnum))
> PageIndexTupleDelete(page, oldoffnum);
> gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
>
> That is: remove old tuple if it’s there, then place updated tuple at the
> end.
>
> Half of the old data have to be shifted my memmove inside
> PageIndexTupleDelete() call, half of the linp-s have to be corrected.
>
> If the updated tuple has same size as already residing on page we can
> just overwrite it. Attached patch demonstrates that concept. Attached
> test.sql inserts million rows into GiST index based on cube extension.
> My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
> storage. Before patch, insert part of test is executed on average
> within 159 second, after patch application: insert part is executed
> within 77 seconds on average. That is almost twice faster (for
> CPU\Mem-bounded inserts, disk-constrained test will show no
> improvement). But it works only for fixed-size tuple inserts.
>

Very promising results!

I know that code in patch is far from beautiful: it operates with
> three different levels of abstraction within 5 lines of code. Those
> are low level memmove(), system-wide PageAddItem() and GiST private
> gistfillBuffer().
>
> By the way PageAddItem() have overwrite flag, but it only works with
> unused ItemId’s. Marking old ItemId as unused before PageAddItem()
> didn’t work for me. Unfortunately bufpage.c routines do not contain
> one for updating(replacing with new) tuple on page. It is important
> for me because I’m working on advanced GiST page layout (
>
> https://www.postgresql.org/message-id/CAJEAwVE0rrr%2BOBT-P0gDCtXbVDkBBG_WcXwCBK%3DGHo4fewu3Yg%40mail.gmail.com
> ), current approach is to use skip-tuples which can be used to skip
> many flowing tuples with one key check. Obviously, this design cares
> about tuples order. And update in a fashion “place updated tuple at
> the end” won’t work for me.
>
> So, I think it would be better to implement PageReplaceItem()
> functionality to make code better, to make existing GiST inserts
> faster and to enable new advanced page layouts in indices.
>

+1 for PageReplaceItem()
Even if item is not the same size, we can move the tail of page once
instead of twice.
I think you should implement PageReplaceItem() version and add it to the
commitfest.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas 'ads' Scherbaum 2016-07-03 10:36:04 Re: to_date_valid()
Previous Message Andrew Borodin 2016-07-03 09:24:58 GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]