Re: [HACKERS] sorting big tables :(

From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: mimo(at)interdata(dot)com(dot)pl, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] sorting big tables :(
Date: 1998-05-20 14:50:11
Message-ID: Pine.BSF.3.96.980520103925.14056O-100000@hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 20 May 1998, Bruce Momjian wrote:

> > > Well, I think it might be optimised slightly. Am I right that postgres
> > > uses heap (i.e. they look like tables) files during sorting? While this
> > > is a merge sort, those files doesn't have to be a table-like files.
> > > Certainly, they might variable length records without pages (aren't they
> > > used sequentially). Moreover we would consider packing tape files before
> > > writting them down if necessary. Of course it will result in some
> > > performance dropdown. However it's better to have less performance that
> > > being unable to sort it at all.
> > >
> > > Last question... What's the purpose of such a big sort? If somebody gets
> > > 40M of sorted records in a result of some query, what would he do with
> > > it? Is he going to spent next years on reading this lecture? I mean,
> > > isn't it worth to query the database for necessary informations only and
> > > then sort it?
> >
> > this I don't know...I never even really thought about that,
> > actually...Michael? :) Only you can answer that one.
>
> I have an idea. Can he run CLUSTER on the data? If so, the sort will
> not use small batches, and the disk space during sort will be reduced.
> However, I think CLUSTER will NEVER finish on such a file, unless it is
> already pretty well sorted.

Okay...then we *do* have a table size limit problem? Tables that
just get too large to be manageable? Maybe this is one area we should be
looking at as far as performance is concerned?

One thing that just pop'd to mind, concerning the above CLUSTER
command...what would it take to have *auto-cluster'ng*? Maybe provide a
means of marking a field in a table for this purpose?

One of the things that the Unix FS does is auto-defragmenting, at
least the UFS one does. Whenever the system is idle (from my
understanding), the kernel uses that time to clean up the file systems, to
reduce the file system fragmentation.

This is by no means SQL92, but it would be a neat
"extension"...let me specify a "CLUSTER on" field. Then, as I'm entering
data into the database, periodically check for fragmentation of the data
and clean up accordingly. If done by the system, reasonably often, it
shouldn't take up *too* much time, as most of the data should already be
in order...

That would have the side-benefit of speeding up the "ORDER by" on
that field also...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas G. Lockhart 1998-05-20 15:02:08 Re: [HACKERS] Re: [DOCS] Re: FE/BE protocol revision patch
Previous Message Bruce Momjian 1998-05-20 14:23:35 Re: [HACKERS] sorting big tables :(