Re: GSOC Student Project Idea

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Michael Schuh <schuh(dot)mike(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSOC Student Project Idea
Date: 2013-04-23 21:25:47
Message-ID: CAPpHfdskjgBMsXd-dLMnhFG-8c2KRyZxseKkByV_6gev6ea6GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael,

On Mon, Apr 22, 2013 at 4:28 PM, Michael Schuh <schuh(dot)mike(at)gmail(dot)com> wrote:

> Although I do not have a lot of experience with PostgreSQL development, I
> am eager to learn and commit my summer to enabling another fantastic
> feature for the community. Since iDistance is a non-recursive, data-driven,
> space-based partitioning strategy which builds directly onto a B+-tree, I
> believe the implementation should be possible using only GiST support.
> Please let me know if this is of any interest, or if you have any
> additional questions. Unfortunately, I will be unavailable most of the day,
> but I plan to fill out the GSOC application later this evening.
>

I've taken a brief look on the paper and implementation. As I can see
iDistance implements some global building strategy. I mean, for example, it
selects some point, calculates distances from selected point to all points
in dataset etc. So, it uses the whole dataset at the same time.

GiST and SP-GiST now behaves differently. They have interface functions
which operates at maximum very small portion of data which fits to disk
page. You can try to fit iDistance into GiST or SP-GiST interface. But will
it be that efficient in this case? I've experimented with fitting VP-tree
(which looks similar to iDistance at first glance) into SP-GiST. Then my
results was depressing: it was useless with levenshtein distance, and it
was worse than R-tree with multidimensional points. So I think we really
need global index building algorithm in order to reveal power of such data
structures. But GiST and SP-GiST currently doesn't support it.

However you can try to implement global index building in GiST or SP-GiST.
In this case I think you should carefully estimate your capabilities during
single GSoC project. You would need to extend GiST or SP-GiST interface and
write completely new implementation of tree building algorithm. Question of
how to exactly extend GiST or SP-GiST interface for this case could appear
to be very hard even theoretically.

In short my opinion is so: don't expect miracle by just implementing GiST
or SP-GiST interface functions.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-04-23 21:34:29 Re: REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Previous Message Bruce Momjian 2013-04-23 21:17:59 Re: 9.3 Beta1 status report