Re: CLUSTER and clustered indices

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER and clustered indices
Date: 2005-11-18 00:57:38
Message-ID: 20051118005738.GE10976@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2005-11-17 at 10:58 -0500, Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>
> The use case exists and the technique is low overhead, but the main
> question is: Does anybody think this behaviour would be beneficial for
> them? (I'm actually in two minds myself, but once the idea has arisen,
> it seems sensible to discuss this for everybody's sake).

I have no use for it but I see it would be beneficial in some cases.

> The trade-off is a table that keeps growing in size, even though you
> VACUUM it, with the benefit that the clustering is maintained.
>
> So how would you maintain it? Looks like you'd still have to use regular
> CLUSTER commands, but at least it would stay good in between.

Yeah, this is a problem. The growth is unbounded. Even if there's a
completely empty page somewhere, it can't be used because all tuples
will go to the last page. The problem with using CLUSTER for
maintenance is that it takes an exclusive lock on the table, which is a
thing we've been running away from. You are right in that it's much
cheaper than CLUSTERing a table that isn't ordered, because there's much
more locality. But I don't think it's a big enough win.

Because of the drawbacks (unbounded growth being the most prominent one)
this would have to be an optional thing. This means we would need an
additional system catalog column to keep whether it's active or not.
And a user command to activate it. So it's starting to be a more
invasive thing. Not that these things matter a whole lot, but anyway.

Personally I'd prefer to see index-ordered heaps, where the heap is
itself an index, so the ordering it automatically kept.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-18 01:16:44 Re: Some array semantics issues
Previous Message Gavin Sherry 2005-11-18 00:51:23 Re: Improving count(*)