Re: Our CLUSTER implementation is pessimal

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Our CLUSTER implementation is pessimal
Date: 2009-01-08 00:24:12
Message-ID: 200901080024.n080OCt27964@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO:

Improve CLUSTER performance by sorting to reduce random I/O

* http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

---------------------------------------------------------------------------

Gregory Stark wrote:
>
> One thing that's been annoying me for a while is that our CLUSTER
> implementation is really very slow. When I say very slow I mean it's really
> very very very slow.
>
> The problem is that it does a full index scan and looks up each tuple in the
> order of the index. That means it a) is doing a lot of random i/o and b) has
> to access the same pages over and over again.
>
> Our query planner knows better and normally does a sequential heap scan and
> sort for queries of this type. But CLUSTER has to use the index.
>
> Just the other day I wanted to randomize the order of a table so I added a
> random column, indexed it and started cluster running. After 6 hours I looked
> at how far it had gotten and found it was only about 20% done rewriting the
> heap (not including rebuilding the indexes). I cancelled it and built a new
> table in the desired orderwith CREATE TABLE AS ... ORDER BY in 45 minutes!
>
> There are a couple problems with this:
>
> a) We need some way to decide *when* to do a sort and when to do an index
> scan. The planner has all this machinery but we don't really have all the
> pieces handy to use it in a utility statement. This is especially important
> for the case where we're doing a cluster operation on a table that's already
> clustered. In that case an index scan could conceivably actually win (though I
> kind of doubt it). I don't really have a solution for this.
>
> b) tuplesort no longer has the pieces needed to sort whole tuples including
> visibility info. And actually even the old pieces that were removed had not
> quite the right interface and behaviour. We need to preserve t_self for the
> heap rewrite tools and we need to be able to use _bt_mkscankey_nodata() to
> generate the scan keys that match the index. The cleanest approach I think is
> to just add back in the old raw tuple methods to tuplesort and tweak them to
> preserve t_self and take Scankeys instead of attnums etc.
>
> c) expression indexes are a problem. We would have to start with constructing
> new tuples to hold the expression values but retain the entire original heap
> tuple. Perhaps pass both an indextuple and a heaptuple to tuplesort and store
> both in the sorttuple? For now I figure we can just punt on expression indexes
> and always do an index sacn.
>
> I tested out the idea and it works reasonably well. The code might need some
> cleanup including, I think refactoring cluster_rel() and possibly tweaking the
> interface to tuplesort. But I'm fairly happy with it.
>
> The results clustering a 2.2GB table with 2,000,000 rows on my 1.5GB laptop
> using maintenance_work_mem of 768MB and work_mem of 256MB:
>
> postgres=# \d x
>
> Table "public.x"
> Column | Type | Modifiers
> --------+------------------+-----------
> i | integer |
> r | double precision |
> repeat | text |
> Indexes:
> "xi" btree (i)
> "xi2" btree ((i + 0))
> "xir" btree (r) CLUSTER
>
> postgresql=# CLUSTER xi ON x;
> CLUSTER
> Time: 736029.322 ms
>
> postgresql=# CLUSTER xir ON x;
> CLUSTER
> Time: 723610.507 ms
>
> postgresql=# CLUSTER xi2 ON x;
> CLUSTER
> Time: 12842793.346 ms
>
>
> That's 3.5 hours for traditional cluster and 12 minutes for cluster using a
> sort...
>

[ Attachment, skipping... ]

>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's RemoteDBA services!
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-01-08 00:25:46 Re: parallel restore
Previous Message Tom Lane 2009-01-08 00:16:24 Re: Updates of SE-PostgreSQL 8.4devel patches (r1389)