Skip site navigation (1) Skip section navigation (2)

Re: Index/Function organized table layout

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: James Rogers <jamesr(at)best(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index/Function organized table layout
Date: 2003-10-02 19:09:12
Message-ID: 1065121751.2535.18.camel@fuji.krosing.net (view raw or flat)
Thread:
Lists: pgsql-hackers
James Rogers kirjutas N, 02.10.2003 kell 20:50:

> To give a real world example, a standard query on one of our tables that
> has not been CLUSTER-ed recently (i.e. within the last several days)
> generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
> generates ~0 cache misses on average.  Needless to say, one is *much*
> faster than the other. 

So what you really need is the CLUSTER command to leave pages half-empty
and the tuple placement logic on inserts/updates to place new tuples
near the place where they would be placed by CLUSTER. I.e. the code that
does actual inserting should be aware of CLUSTERing.

I guess that similar behaviour (half-empty pages, or even each second
page empty which is better as it creates less dirty buffers) could also
significantly speed up updates on huge number of tuples, as then code
could always select a place near the old one and thus avoid needless
head-movements between reading and writing areas.

> In my case, not only does CLUSTER-ing increase the number of concurrent
> queries possible without disk thrashing by an integer factor, but the
> number of buffers touched on a query that generates a cache misses is
> greatly reduced as well.  The problem is that CLUSTER-ing is costly and
> index-organizing some of the tables would reduce the buffer needs, since
> the index tuple in these cases are almost as large as the heap tuples
> they reference.

True, but my above suggestion would be much easier to implement
near-term. It seems to be a nice incremental improvement just needs 
to touch places:

1. selecting where new tuples go : 

  * updated ones go near old ones if not clustered and near the place
    CLUSTER would place them if clustered. 

  * inserted ones go to the less than half-empty pages if not clustered
    and near the place CLUSTER would place them if clustered. 

2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
in pages for clustered updates/inserts.

the "half" above could of course mean anything from 10% to 95% depending
on access patterns.

---------------------
Hannu


In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2003-10-02 19:10:46
Subject: Re: Large block size problems and notes...
Previous:From: Sean ChittendenDate: 2003-10-02 18:58:15
Subject: Large block size problems and notes...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group