Re: GSoC 2017

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: GSoC 2017
Date: 2017-01-10 17:13:32
Message-ID: CAJEAwVGvVh-=S0SchTCkdWf-oiaeOMew7uiqVFYcV7c=PzP36g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-01-10 14:53 GMT+05:00 Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>:
> 1. What project ideas we have?

I have one more project of interest which I can mentor.

Topic. GiST API advancement

===Idea===
GiST API was designed at the beginning of 90th to reduce boilerplate
code around data access methods over balanced tree. Now, after 30
years, there are some ideas on improving this API.

===Project details===
Opclass developer must specify 4 core operations to make a type GiST-indexable:
1. Split: a function to split set of datatype instances into two parts.
2. Penalty calculation: a function to measure penalty for unification
of two keys.
3. Collision check: a function which determines whether two keys may
have overlap or are not intersecting.
4. Unification: a function to combine two keys into one so that
combined key collides with both input keys.

Functions 2 and 3 can be improved.
For example, Revised R*-tree[1] algorithm of insertion cannot be
expressed in terms of penalty-based algorithms. There was some
attempts to bring parts of RR*-tree insertion, but they come down to
ugly hacks [2]. Current GiST API, due to penalty-based insertion
algorithm, does not allow to implement important feature of RR*-tree:
overlap optimization. As Norbert Beckman, author of RR*-tree, put it
in discussion: “Overlap optimization is one of the main elements, if
not the main query performance tuning element of the RR*-tree. You
would fall back to old R-Tree times if that would be left off.”

Collision check currently returns binary result:
1. Query may be collides with subtree MBR
2. Query do not collides with subtree
This result may be augmented with a third state: subtree is totally
within query. In this case GiST scan can scan down subtree without key
checks.

Potential effect of these improvements must be benchmarked. Probably,
implementation of these two will spawn more ideas on GiST performance
improvements.

Finally, GiST do not provide API for bulk loading. Alexander Korotkov
during GSoC 2011 implemented buffered GiST build. This index
construction is faster, but yields the index tree with virtually same
querying performance. There are different algorithms aiming to provide
better indexing tree due to some knowledge of data, e.g. [3]

[1] Beckmann, Norbert, and Bernhard Seeger. "A revised r*-tree in
comparison with related index structures." Proceedings of the 2009 ACM
SIGMOD International Conference on Management of data. ACM, 2009.
[2] https://www.postgresql.org/message-id/flat/CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg%40mail(dot)gmail(dot)com#CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg(at)mail(dot)gmail(dot)com
[3] Achakeev, Daniar, Bernhard Seeger, and Peter Widmayer. "Sort-based
query-adaptive loading of r-trees." Proceedings of the 21st ACM
international conference on Information and knowledge management. ACM,
2012.

Best regards, Andrey Borodin.

In response to

  • GSoC 2017 at 2017-01-10 09:53:11 from Alexander Korotkov

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2017-01-10 17:19:20 Re: pageinspect: Hash index support
Previous Message David Fetter 2017-01-10 17:00:01 Re: PoC: Make it possible to disallow WHERE-less UPDATE and DELETE