Re: Yet another fast GiST build

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Yet another fast GiST build
Date: 2019-08-26 10:27:27
Message-ID: CAC8Q8t+WqGkgKh3ptarkiaE7unxJR-RFoTNsy=0=4u5F9wo8Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

This is very interesting. In my pipeline currently GiST index rebuild is
the biggest time consuming step.

I believe introducing optional concept of order in the GiST opclass will be
beneficial not only for fast build, but for other tasks later:
- CLUSTER can order the table using that notion, in parallel way.
- btree_gist can be even closer to btree by getting the tuples sorted
inside page.
- tree descend on insertion in future can traverse the list in more
opportunistic way, calculating penalty for siblings-by-order first.

I believe everywhere the idea of ordering is needed it's provided by giving
a btree opclass.

How about giving a link to btree opclass inside a gist opclass?

On Mon, Aug 26, 2019 at 10:59 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
wrote:

> Hi!
>
> In many cases GiST index can be build fast using z-order sorting.
>
> I've looked into proof of concept by Nikita Glukhov [0] and it looks very
> interesting.
> So, I've implemented yet another version of B-tree-like GiST build.
> It's main use case and benefits can be summarized with small example:
>
> postgres=# create table x as select point (random(),random()) from
> generate_series(1,3000000,1);
> SELECT 3000000
> Time: 5061,967 ms (00:05,062)
> postgres=# create index ON x using gist (point ) with
> (fast_build_sort_function=gist_point_sortsupport);
> CREATE INDEX
> Time: 6140,227 ms (00:06,140)
> postgres=# create index ON x using gist (point );
> CREATE INDEX
> Time: 32061,200 ms (00:32,061)
>
> As you can see, Z-order build is on order of magnitude faster. Select
> performance is roughly the same. Also, index is significantly smaller.
>
> Nikita's PoC is faster because it uses parallel build, but it intervenes
> into B-tree code a lot (for reuse). This patchset is GiST-isolated.
> My biggest concern is that passing function to relation option seems a bit
> hacky. You can pass there any function matching sort support signature.
> Embedding this function into opclass makes no sense: it does not affect
> scan anyhow.
>
> In current version, docs and tests are not implemented. I want to discuss
> overall design. Do we really want yet another GiST build, if it is 3-10
> times faster?
>
> Thanks!
>
> Best regards, Andrey Borodin.
>
> [0]
> https://github.com/postgres/postgres/compare/master...glukhovn:gist_btree_build
>
>

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-08-26 10:47:22 Re: Yet another fast GiST build
Previous Message Richard Guo 2019-08-26 09:32:48 A problem about partitionwise join