Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

From: "sergei sh(dot)" <sshoulbakov(at)kontur(dot)io>
To: Aliaksandr Kalenik <akalenik(at)kontur(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] reduce page overlap of GiST indexes built using sorted method
Date: 2022-01-09 11:28:20
Message-ID: 3aa6ba30-e9d8-10ef-753f-8deea5f196d0@kontur.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

here are some benchmark results for GiST patch:
https://gist.github.com/mngr777/88ae200c9c30ba5656583d92e8d2cf9e

Code used for benchmarking: https://github.com/mngr777/pg_index_bm2,
see README for the list of test queries.

Results show query performance close to indexes built with no
sortsupport function, with index creation time reduced approx. by half.

On 1/8/22 10:20 PM, Aliaksandr Kalenik wrote:
> After further testing, here is v2 where the issue that rightlink can be
> set when an index page is already flushed is fixed.
>
> On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru
> <mailto:x4mmm(at)yandex-team(dot)ru>> wrote:
>
>
> Hi Aliaksandr!
>
> Thanks for working on this!
>
> > Benchmark summary:
> >
> > create index roads_rdr_idx on roads_rdr using gist (geom);
> >
> > with sort support before patch / CREATE INDEX 76.709 ms
> >
> > with sort support after patch / CREATE INDEX 225.238 ms
> >
> > without sort support / CREATE INDEX 446.071 ms
> >
> > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
> >
> > with sort support before patch / SELECT 5766.526 ms
> >
> > with sort support after patch / SELECT 2646.554 ms
> >
> > without sort support / SELECT 2721.718 ms
> >
> > index size
> >
> > with sort support before patch / IDXSIZE 2940928 bytes
> >
> > with sort support after patch / IDXSIZE 4956160 bytes
> >
> > without sort support / IDXSIZE 5447680 bytes
>
> The numbers are impressive, newly build index is actually performing
> better!
> I've conducted some tests over points, not PostGIS geometry. For
> points build is 2x slower now, but this is the cost of faster scans.
>
> Some strong points of this index building technology.
> The proposed algorithm is not randomly chosen as anything that
> performs better than the original sorting build. We actually
> researched every idea we knew from literature and intuition.
> Although we consciously limited the search area by existing GiST API.
> Stuff like penalty-based choose-subtree and split in equal halves
> seriously limit possible solutions. If anyone knows an any better
> way to build GiST faster or with better scan performance - please
> let us know.
> The proposed algorithm contains the current algorithm as a special
> case: there is a parameter - the number of buffers accumulated
> before calling Split. If this parameter is 1 proposed algorithm will
> produce exactly the same output.
>
> At this stage, we would like to hear some feedback from Postgres and
> PostGIS community. What other performance aspects should we test?
>
> Current patch implementation has some known deficiencies:
> 1. We need a GUC instead of the hard-coded buffer of 8 pages.
> 2. Is GiST sorting build still deterministic? If not - we should add
> a fixed random seed into pageinspect tests.
> 3. It would make sense to check the resulting indexes with amcheck
> [0], although it's not committed.
> 4. We cannot make an exact fillfactor due to the behavior of
> picksplit. But can we improve anything here? I think if not - it's
> still OK.
> 5. GistSortedBuildPageState is no more page state. It's Level state
> or something like that.
> 6. The patch desperately needs comments.
>
>
> Thanks!
>
> Best regards, Andrey Borodin.
>
> [0]
> https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru
> <https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2022-01-09 11:43:06 Re: Multiple Query IDs for a rewritten parse tree
Previous Message Alexander Lakhin 2022-01-09 11:00:00 Re: Why is src/test/modules/committs/t/002_standby.pl flaky?