Re: cluster index on a table

From: "phb07(at)apra(dot)asso(dot)fr" <phb07(at)apra(dot)asso(dot)fr>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: cluster index on a table
Date: 2009-07-17 13:25:52
Message-ID: 20090717132551.88E744B002C@smtp2-g21.free.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

>On Wed, Jul 15, 2009 at 10:36 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:

I'd love to see it.
>

> +1 for index organized tables
>

>--Scott

+1 also for me...

I am currently working for a large customer who is migrating his main application towards PostgreSQL, this application currently using DB2 and RFM-II (a RDBMS ued on Bull GCOS 8 mainframes). With both RDBMS, "cluster index" are used and data rows are stored taking into account these indexes. The benefits are :
- a good performance level, especially for batch chains that more or less "scan" a lot of large tables,
- and table reorganisations remain not too frequent (about once a month).
To keep a good performance level with PostgreSQL, I expect that we will need more frequent reorganisation operations, with the drawbacks this generates for the production schedules. This is one of the very few regressions we need to address (or may be the only one).

Despite my currently limited knowledge of the postgres internals, I don't see why it should be difficult to simply adapt the logic used to determine the data row location at insert time, using something like :
- read the cluster index to find the tid of the row having the key value just less than the key value of the row to insert,
- if there is place enough in this same page (due to the use of FILLFACTOR or previous row deletion), use it,
- else use the first available place using fsm.
This doesn't change anything on MVCC mechanism, doesn't change index structure and management, and doesn't require data row move.
This doesn't not ensure that all rows are allways in the "right" order but if the FILLFACTOR are correctly set, most rows are well stored, requiring less reorganisation.
But I probably miss something ;-)

Regards. Philippe Beaudoin.

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2009-07-17 13:40:40 Calling conventions
Previous Message Scott Marlowe 2009-07-17 09:06:37 Re: Strange memory behavior with rails - caching in connection?