Re: Maintaining cluster order on insert

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Maintaining cluster order on insert
Date: 2007-05-15 22:26:51
Message-ID: 464A33AB.1050401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Ah, thanks! I had forgotten about it as well.

Bruce Momjian wrote:
> [ Sorry I found this one only found recently.]
>
> Your patch has been added to the PostgreSQL unapplied patches list at:
>
> http://momjian.postgresql.org/cgi-bin/pgpatches
>
> It will be applied as soon as one of the PostgreSQL committers reviews
> and approves it.
>
> ---------------------------------------------------------------------------
>
>
> Heikki Linnakangas wrote:
>> While thinking about index-organized-tables and similar ideas, it
>> occurred to me that there's some low-hanging-fruit: maintaining cluster
>> order on inserts by trying to place new heap tuples close to other
>> similar tuples. That involves asking the index am where on the heap the
>> new tuple should go, and trying to insert it there before using the FSM.
>> Using the new fillfactor parameter makes it more likely that there's
>> room on the page. We don't worry about the order within the page.
>>
>> The API I'm thinking of introduces a new optional index am function,
>> amsuggestblock (suggestions for a better name are welcome). It gets the
>> same parameters as aminsert, and returns the heap block number that
>> would be optimal place to put the new tuple. It's be called from
>> ExecInsert before inserting the heap tuple, and the suggestion is passed
>> on to heap_insert and RelationGetBufferForTuple.
>>
>> I wrote a little patch to implement this for btree, attached.
>>
>> This could be optimized by changing the existing aminsert API, because
>> as it is, an insert will have to descend the btree twice. Once in
>> amsuggestblock and then in aminsert. amsuggestblock could keep the right
>> index page pinned so aminsert could locate it quicker. But I wanted to
>> keep this simple for now. Another improvement might be to allow
>> amsuggestblock to return a list of suggestions, but that makes it more
>> expensive to insert if there isn't room in the suggested pages, since
>> heap_insert will have to try them all before giving up.
>>
>> Comments regarding the general idea or the patch? There should probably
>> be a index option to turn the feature on and off. You'll want to turn it
>> off when you first load a table, and turn it on after CLUSTER to keep it
>> clustered.
>>
>> Since there's been discussion on keeping the TODO list more up-to-date,
>> I hereby officially claim the "Automatically maintain clustering on a
>> table" TODO item :). Feel free to bombard me with requests for status
>> reports. And just to be clear, I'm not trying to sneak this into 8.2
>> anymore, this is 8.3 stuff.
>>
>> I won't be implementing a background daemon described on the TODO item,
>> since that would essentially be an online version of CLUSTER. Which sure
>> would be nice, but that's a different story.
>>
>> - Heikki
>>
>
>

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2007-05-15 22:29:30 Re: Managing the community information stream
Previous Message Bruce Momjian 2007-05-15 22:24:11 Re: 8.3 pending patch queue

Browse pgsql-patches by date

  From Date Subject
Next Message Jim C. Nasby 2007-05-16 00:08:07 Re: Automatic adjustment of bgwriter_lru_maxpages
Previous Message Bruce Momjian 2007-05-15 22:23:19 Re: Maintaining cluster order on insert