Re: GiST "choose subtree" support function to inline penalty

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: GiST "choose subtree" support function to inline penalty
Date: 2019-06-27 10:50:04
Message-ID: CAC8Q8tKRvnkCVzC3KQ1wShSz02cgKqdJ_jyLmw_Tx8h4NzK95A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 27, 2019 at 6:00 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me(at)komzpa(dot)net>
> writes:
> > I'm looking at PostGIS geometry GiST index build times and try to
> optimize
> > withing the current GiST framework. The function that shows a lot on my
> > flame graphs is penalty.
>
> > I spent weekend rewriting PostGIS penalty to be as fast as possible.
> > (FYI https://github.com/postgis/postgis/pull/425/files)
>
> > However I cannot get any meaningfully faster build time. Even when I
> strip
> > it to "just return edge extension" index build time is the same.
>
> TBH this makes me wonder whether the real problem isn't so much "penalty
> function is too slow" as "penalty function is resulting in really bad
> index splits".
>

As an extension writer I don't have much control on how Postgres calls
penalty function. PostGIS box is using floats instead of doubles, so its
size is twice as small as postgres builtin box, meaning penalty is called
even more often on better packed pages.

I can get index construction speed to be much faster if I break penalty to
actually result in horrible splits: index size grows 50%, construction is
30% faster.

>
> It might be that giving the opclass higher-level control over the split
> decision can help with both aspects.

Please note the question is not about split. Korotkov's split is working
fine. Issue is with penalty and computations required for choosing the
subtree before split happens.

Andrey Borodin proposed off-list that we can provide our own index type
that is a copy of GiST but with penalty inlined into "choose subtree" code
path, as that seems to be the only way to do it in PG12. Is there a more
humane option than forking GiST?

> But never start micro-optimizing
> an algorithm until you're sure it's the right algorithm.
>

That's exactly the reason I write original letter. I don't see any option
for further optimization in existing GiST framework, but this optimization
is needed: waiting 10 hours for GiST to build after an hour of ingesting
the dataset is frustrating, especially when you see a nearby b-tree done in
an hour.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2019-06-27 11:05:36 Obsolete comment in commands/analyze.c
Previous Message Arthur Zakirov 2019-06-27 09:12:15 Extra quote_all_identifiers in _dumpOptions