GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Subject: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date: 2016-07-03 09:24:58
Message-ID: CAJEAwVGQjGGOj6mMSgMwGvtFd5Kwe6VFAxY=uEPZWMDjzbn4VQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, hackers!

I think there is some room for improving GiST inserts. Following is
the description of what I think is performance problem.

Function gistplacetopage in file /src/backend/access/gist/gist.c is
responsible for updating or appending new tuple on GiST page.
Currently, after checking necessity of page split due to overflow, it
essentially executes following:

if (OffsetNumberIsValid(oldoffnum))
PageIndexTupleDelete(page, oldoffnum);
gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);

That is: remove old tuple if it’s there, then place updated tuple at the end.

Half of the old data have to be shifted my memmove inside
PageIndexTupleDelete() call, half of the linp-s have to be corrected.

If the updated tuple has same size as already residing on page we can
just overwrite it. Attached patch demonstrates that concept. Attached
test.sql inserts million rows into GiST index based on cube extension.
My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
storage. Before patch, insert part of test is executed on average
within 159 second, after patch application: insert part is executed
within 77 seconds on average. That is almost twice faster (for
CPU\Mem-bounded inserts, disk-constrained test will show no
improvement). But it works only for fixed-size tuple inserts.

I know that code in patch is far from beautiful: it operates with
three different levels of abstraction within 5 lines of code. Those
are low level memmove(), system-wide PageAddItem() and GiST private
gistfillBuffer().

By the way PageAddItem() have overwrite flag, but it only works with
unused ItemId’s. Marking old ItemId as unused before PageAddItem()
didn’t work for me. Unfortunately bufpage.c routines do not contain
one for updating(replacing with new) tuple on page. It is important
for me because I’m working on advanced GiST page layout (
https://www.postgresql.org/message-id/CAJEAwVE0rrr%2BOBT-P0gDCtXbVDkBBG_WcXwCBK%3DGHo4fewu3Yg%40mail.gmail.com
), current approach is to use skip-tuples which can be used to skip
many flowing tuples with one key check. Obviously, this design cares
about tuples order. And update in a fashion “place updated tuple at
the end” won’t work for me.

So, I think it would be better to implement PageReplaceItem()
functionality to make code better, to make existing GiST inserts
faster and to enable new advanced page layouts in indices.

Current work is here https://github.com/x4m/pggistopt/tree/simplefix

What do you think about found performance problem and about patch
attached? Am I missing something?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachment Content-Type Size
test.sql application/octet-stream 428 bytes
GiST memmove.patch application/octet-stream 1.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-07-03 10:21:12 Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Previous Message Jaime Casanova 2016-07-03 05:05:40 Re: to_date_valid()