Re: GSoC 2014 proposal

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Иван Парфилов <iparfilov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSoC 2014 proposal
Date: 2014-04-02 10:22:20
Message-ID: CAPpHfdv=N2YVM8xYp3O-Kco1u9e8CDupdrhmMn_5+nYhcDY_PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com>wrote:

> The BIRCH algorithm as described in the paper describes building a tree in
> memory. If I understood correctly, you're suggesting to use a pre-built
> GiST index instead. Interesting idea!
>
> There are a couple of signifcant differences between the CF tree described
> in the paper and GiST:
>
> 1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
> a leaf item represents a cluster, which consists of one or more tuples. So
> the CF tree doesn't store an entry for every input tuple, which makes it
> possible to keep it in memory.
>
> 2. In the CF tree, "all entries in a leaf node must satisfy a threshold
> requirement, with respect to a threshold value T: the diameter (or radius)
> has to be less than T". GiST imposes no such restrictions. An item can
> legally be placed anywhere in the tree; placing it badly will just lead to
> degraded search performance, but it's still a legal GiST tree.
>
> 3. A GiST index, like any other index in PostgreSQL, holds entries also
> for deleted tuples, until the index is vacuumed. So you cannot just use
> information from a non-leaf node and use it in the result, as the
> information summarized at a non-leaf level includes noise from the dead
> tuples.
>
> Can you elaborate how you are planning to use a GiST index to implement
> BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
> strict in where in the tree an item can be stored, and lets the operator
> class to specify exactly when a node is split etc.
>

Hmmm, it's likely I've imagined something quite outside of this paper, and
even already suggested it to Ivan... :)
I need a little time to rethink it.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-04-02 11:59:03 Re: Including replication slot data in base backups
Previous Message Andres Freund 2014-04-02 09:58:10 Re: Including replication slot data in base backups