Skip site navigation (1) Skip section navigation (2)

Maintaining cluster order on insert

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Maintaining cluster order on insert
Date: 2006-08-09 19:04:17
Message-ID: 44DA31B1.3090700@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
While thinking about index-organized-tables and similar ideas, it 
occurred to me that there's some low-hanging-fruit: maintaining cluster 
order on inserts by trying to place new heap tuples close to other 
similar tuples. That involves asking the index am where on the heap the 
new tuple should go, and trying to insert it there before using the FSM. 
Using the new fillfactor parameter makes it more likely that there's 
room on the page. We don't worry about the order within the page.

The API I'm thinking of introduces a new optional index am function, 
amsuggestblock (suggestions for a better name are welcome). It gets the 
same parameters as aminsert, and returns the heap block number that 
would be optimal place to put the new tuple. It's be called from 
ExecInsert before inserting the heap tuple, and the suggestion is passed 
on to heap_insert and RelationGetBufferForTuple.

I wrote a little patch to implement this for btree, attached.

This could be optimized by changing the existing aminsert API, because 
as it is, an insert will have to descend the btree twice. Once in 
amsuggestblock and then in aminsert. amsuggestblock could keep the right 
index page pinned so aminsert could locate it quicker. But I wanted to 
keep this simple for now. Another improvement might be to allow 
amsuggestblock to return a list of suggestions, but that makes it more 
expensive to insert if there isn't room in the suggested pages, since 
heap_insert will have to try them all before giving up.

Comments regarding the general idea or the patch? There should probably 
be a index option to turn the feature on and off. You'll want to turn it 
off when you first load a table, and turn it on after CLUSTER to keep it 
clustered.

Since there's been discussion on keeping the TODO list more up-to-date, 
I hereby officially claim the "Automatically maintain clustering on a 
table" TODO item :). Feel free to bombard me with requests for status 
reports. And just to be clear, I'm not trying to sneak this into 8.2 
anymore, this is 8.3 stuff.

I won't be implementing a background daemon described on the TODO item, 
since that would essentially be an online version of CLUSTER. Which sure 
would be nice, but that's a different story.

- Heikki


Attachment: clustermaintenance.diff
Description: text/x-patch (34.1 KB)

Responses

pgsql-hackers by date

Next:From: Jim C. NasbyDate: 2006-08-09 19:14:18
Subject: Re: An Idea for planner hints
Previous:From: Jim C. NasbyDate: 2006-08-09 19:00:53
Subject: Re: An Idea for planner hints

pgsql-patches by date

Next:From: Greg Sabino MullaneDate: 2006-08-09 19:54:29
Subject: Fix statement timing display
Previous:From: Tom LaneDate: 2006-08-09 18:27:33
Subject: Re: Plugins redux (was Re: [PATCHES] PL instrumentation plugin support)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group