Re: Index/Function organized table layout

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: James Rogers <jamesr(at)best(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index/Function organized table layout
Date: 2003-10-02 19:09:12
Message-ID: 1065121751.2535.18.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

James Rogers kirjutas N, 02.10.2003 kell 20:50:

> To give a real world example, a standard query on one of our tables that
> has not been CLUSTER-ed recently (i.e. within the last several days)
> generates an average of ~2,000 cache misses. Recently CLUSTER-ed, it
> generates ~0 cache misses on average. Needless to say, one is *much*
> faster than the other.

So what you really need is the CLUSTER command to leave pages half-empty
and the tuple placement logic on inserts/updates to place new tuples
near the place where they would be placed by CLUSTER. I.e. the code that
does actual inserting should be aware of CLUSTERing.

I guess that similar behaviour (half-empty pages, or even each second
page empty which is better as it creates less dirty buffers) could also
significantly speed up updates on huge number of tuples, as then code
could always select a place near the old one and thus avoid needless
head-movements between reading and writing areas.

> In my case, not only does CLUSTER-ing increase the number of concurrent
> queries possible without disk thrashing by an integer factor, but the
> number of buffers touched on a query that generates a cache misses is
> greatly reduced as well. The problem is that CLUSTER-ing is costly and
> index-organizing some of the tables would reduce the buffer needs, since
> the index tuple in these cases are almost as large as the heap tuples
> they reference.

True, but my above suggestion would be much easier to implement
near-term. It seems to be a nice incremental improvement just needs
to touch places:

1. selecting where new tuples go :

* updated ones go near old ones if not clustered and near the place
CLUSTER would place them if clustered.

* inserted ones go to the less than half-empty pages if not clustered
and near the place CLUSTER would place them if clustered.

2. making reorganization code (CLUSTER and VACUUM FULL) to leave space
in pages for clustered updates/inserts.

the "half" above could of course mean anything from 10% to 95% depending
on access patterns.

---------------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-10-02 19:10:46 Re: Large block size problems and notes...
Previous Message Sean Chittenden 2003-10-02 18:58:15 Large block size problems and notes...