Re: [HACKERS] sorting big tables :(

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: scrappy(at)hub(dot)org (The Hermit Hacker)
Cc: mimo(at)interdata(dot)com(dot)pl, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] sorting big tables :(
Date: 1998-05-20 15:02:10
Message-ID: 199805201502.LAA14967@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > I have an idea. Can he run CLUSTER on the data? If so, the sort will
> > not use small batches, and the disk space during sort will be reduced.
> > However, I think CLUSTER will NEVER finish on such a file, unless it is
> > already pretty well sorted.
>
> Okay...then we *do* have a table size limit problem? Tables that
> just get too large to be manageable? Maybe this is one area we should be
> looking at as far as performance is concerned?

Well, cluster moves one row at a time, so if the table is very
fragmented, the code is slow because it is seeking all over the table.
See the cluster manual pages for an alternate solution, the uses ORDER
BY.

>
> One thing that just pop'd to mind, concerning the above CLUSTER
> command...what would it take to have *auto-cluster'ng*? Maybe provide a
> means of marking a field in a table for this purpose?

Hard to do. That's what we have indexes for.

>
> One of the things that the Unix FS does is auto-defragmenting, at
> least the UFS one does. Whenever the system is idle (from my
> understanding), the kernel uses that time to clean up the file systems, to
> reduce the file system fragmentation.
>
> This is by no means SQL92, but it would be a neat
> "extension"...let me specify a "CLUSTER on" field. Then, as I'm entering
> data into the database, periodically check for fragmentation of the data
> and clean up accordingly. If done by the system, reasonably often, it
> shouldn't take up *too* much time, as most of the data should already be
> in order...
>
> That would have the side-benefit of speeding up the "ORDER by" on
> that field also...

We actually can have a CLUSTER ALL command, that does this. No one has
implemented it yet.

--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-05-20 15:33:34 Re: [HACKERS] Re: [DOCS] Re: FE/BE protocol revision patch
Previous Message Thomas G. Lockhart 1998-05-20 15:02:08 Re: [HACKERS] Re: [DOCS] Re: FE/BE protocol revision patch