Re: I: About "Our CLUSTER implementation is pessimal" patch

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-09-29 05:25:38
Message-ID: AANLkTini6r3EvJ6XkLk9tPkSt-+g55SF52wwNa6gfrj5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 29, 2010 at 12:53 PM, Josh Kupershmidt <schmiddy(at)gmail(dot)com> wrote:
> I thought this paragraph was a little confusing:

Thanks for checking.

> !     In the second case, a full table scan is followed by a sort operation.
> !     The method is faster than the first one when the table is highly
> fragmented.
> !     You need twice disk space of the sum in the case. In addition to the free
> !     space needed by the previous case, this approach may also need a temporary
> !     disk sort file which can be as big as the original table.
>
> I think the worst-case disk space could be made a little more clear
> here, and maybe some general wordsmithing as well. I wasn't sure what
> "twice disk space of the sum" was in this description -- sum of what
> (table and all indexes?).

To be exact, It's very complex.
During reconstructing tables, it requires about twice disk space of
the old table (for sort tapes and the new table).
After sorting the table, CLUSTER performs REINDEX. We need
{same size of the new table} + {twice disk space of the new indexes}.
Also, new relations will be the same sizes of old relations if they
have no free spaces.

So, I think "twice disk space of the sum of table and indexes" would be
the simplest explanation for safe margin.

> Also, AIUI, this second clustering method is similar to the older
> idiom of CREATE TABLE new AS SELECT * FROM old ORDER BY col; Since the
> paragraph describing this older idiom is being removed, perhaps a
> brief mention in the documentation could be made of this similarity.

Good idea.

> Some more wordsmithing: change
> !      The planner tries to choose a faster method in them base on the
> information
> to:
> !      The planner tries to choose the fastest method based on the information

Thanks.

--
Itagaki Takahiro

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sushant Sinha 2010-09-29 05:29:47 Re: english parser in text search: support for multiple words in the same position
Previous Message Darren Duncan 2010-09-29 05:20:38 Re: Proposal: plpgsql - "for in array" statement