Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

From: Aliaksandr Kalenik <akalenik(at)kontur(dot)io>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] reduce page overlap of GiST indexes built using sorted method
Date: 2022-01-08 19:20:45
Message-ID: CAHqSB9hq=r0tmivQGkTE0JRgp1qJSWV+7qGqLXPu5XLfEWXUOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After further testing, here is v2 where the issue that rightlink can be set
when an index page is already flushed is fixed.

On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:

>
> Hi Aliaksandr!
>
> Thanks for working on this!
>
> > Benchmark summary:
> >
> > create index roads_rdr_idx on roads_rdr using gist (geom);
> >
> > with sort support before patch / CREATE INDEX 76.709 ms
> >
> > with sort support after patch / CREATE INDEX 225.238 ms
> >
> > without sort support / CREATE INDEX 446.071 ms
> >
> > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
> >
> > with sort support before patch / SELECT 5766.526 ms
> >
> > with sort support after patch / SELECT 2646.554 ms
> >
> > without sort support / SELECT 2721.718 ms
> >
> > index size
> >
> > with sort support before patch / IDXSIZE 2940928 bytes
> >
> > with sort support after patch / IDXSIZE 4956160 bytes
> >
> > without sort support / IDXSIZE 5447680 bytes
>
> The numbers are impressive, newly build index is actually performing
> better!
> I've conducted some tests over points, not PostGIS geometry. For points
> build is 2x slower now, but this is the cost of faster scans.
>
> Some strong points of this index building technology.
> The proposed algorithm is not randomly chosen as anything that performs
> better than the original sorting build. We actually researched every idea
> we knew from literature and intuition. Although we consciously limited the
> search area by existing GiST API.
> Stuff like penalty-based choose-subtree and split in equal halves
> seriously limit possible solutions. If anyone knows an any better way to
> build GiST faster or with better scan performance - please let us know.
> The proposed algorithm contains the current algorithm as a special case:
> there is a parameter - the number of buffers accumulated before calling
> Split. If this parameter is 1 proposed algorithm will produce exactly the
> same output.
>
> At this stage, we would like to hear some feedback from Postgres and
> PostGIS community. What other performance aspects should we test?
>
> Current patch implementation has some known deficiencies:
> 1. We need a GUC instead of the hard-coded buffer of 8 pages.
> 2. Is GiST sorting build still deterministic? If not - we should add a
> fixed random seed into pageinspect tests.
> 3. It would make sense to check the resulting indexes with amcheck [0],
> although it's not committed.
> 4. We cannot make an exact fillfactor due to the behavior of picksplit.
> But can we improve anything here? I think if not - it's still OK.
> 5. GistSortedBuildPageState is no more page state. It's Level state or
> something like that.
> 6. The patch desperately needs comments.
>
>
> Thanks!
>
> Best regards, Andrey Borodin.
>
> [0]
> https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru
>

Attachment Content-Type Size
02_reduce_page_overlap_of_gist_indexes_built_using_sorted_method.patch application/octet-stream 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jake North 2022-01-08 19:26:51 Re: [feature request] ts_headline should have an option to highlight only full matches of <-> expressions
Previous Message Jake North 2022-01-08 19:09:24 [feature request] ts_headline should have an option to highlight only full matches of <-> expressions