Re: Fractal tree indexing

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, josh(at)agliodbs(dot)com, daniel(at)heroku(dot)com, andrew(at)dunslane(dot)net
Subject: Re: Fractal tree indexing
Date: 2016-11-16 07:46:50
Message-ID: CAJEAwVE_8VDTV1Utfe1CmbVRCvVPtvHZnSKFdMf0rB5mELSLeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

Here is prototype of procrastinating GiST. Or lazy GiST. Or buffered GiST.
Indeed, there is nothing fractal about it.

The concept is leaf tuples stall on internal pages. From time to time they
are pushed down in batches. This code is far from perfect, I've coded this
only for the research purposes.

I have tested code with insertion of random 3D cubes. On 1 million of
insertions classic GiST is 8% faster, on 3M performance is equal, on 30M
lazy GiST if 12% faster.
I've tested it with SSD disk, may be with HDD difference will be more
significant.
Test scenario is here https://github.com/x4m/postgres_g/blob/bufdev/test.sql

In theory, this is cache friendly implementation (but not cache oblivious)
and it must be faster even for small datasets without huge disk work. But
it has there extra cost of coping batch of tuples to palloc`d array,
couln't avoid that.

This is just a proof-of-concept for performance measrures:
1. make check fails for two tests
2. WAL is broken
3. code is a mess in some places

I'm not sure 12% of performance worth it. I'll appreciate any ideas on what
to do next.

Best regards, Andrey Borodin.

2013-02-13 22:54 GMT+05:00 Simon Riggs <simon(at)2ndquadrant(dot)com>:

> On 13 February 2013 16:48, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
> wrote:
> > On 13.02.2013 18:20, Tom Lane wrote:
> >>
> >> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> writes:
> >>>
> >>> The basic idea of a fractal tree index is to attach a buffer to every
> >>> non-leaf page. On insertion, instead of descending all the way down to
> >>> the correct leaf page, the new tuple is put on the buffer at the root
> >>> page. When that buffer fills up, all the tuples in the buffer are
> >>> cascaded down to the buffers on the next level pages. And recursively,
> >>> whenever a buffer fills up at any level, it's flushed to the next
> level.
> >>
> >>
> >> [ scratches head... ] What's "fractal" about that? Or is that just a
> >> content-free marketing name for this technique?
> >
> >
> > I'd call it out as a marketing name. I guess it's fractal in the sense
> that
> > all levels of the tree can hold "leaf tuples" in the buffers; the
> structure
> > looks the same no matter how deep you zoom, like a fractal.. But
> "Buffered"
> > would be more appropriate IMO.
>
> I hope for their sake there is more to it than that. It's hard to see
> how buffering can be patented.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>

Attachment Content-Type Size
lazygist_v1.diff text/plain 16.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yury Zhuravlev 2016-11-16 08:05:55 Re: WIP: About CMake v2
Previous Message Etsuro Fujita 2016-11-16 07:38:44 Re: Push down more UPDATEs/DELETEs in postgres_fdw