Re: [PERFORM] Releasing memory during External sorting?

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Ron Peacetree" <rjpeace(at)earthlink(dot)net>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [PERFORM] Releasing memory during External sorting?
Date: 2005-09-26 17:17:26
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D107@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Ron Peacetree [mailto:rjpeace(at)earthlink(dot)net]
> Sent: Saturday, September 24, 2005 3:31 AM
> To: Dann Corbit; pgsql-hackers(at)postgresql(dot)org; pgsql-
> performance(at)postgresql(dot)org
> Subject: RE: [HACKERS] [PERFORM] Releasing memory during External
sorting?
>
> From: Dann Corbit <DCorbit(at)connx(dot)com>
> Sent: Sep 23, 2005 5:38 PM
> Subject: RE: [HACKERS] [PERFORM] Releasing memory during External
sorting?
>
> >_C Unleashed_ also explains how to use a callback function to perform
> >arbitrary radix sorts (you simply need a method that returns the
> >[bucketsize] most significant bits for a given data type, for the
length
> >of the key).
> >
> >So you can sort fairly arbitrary data in linear time (of course if
the
> >key is long then O(n*log(n)) will be better anyway.)
> >
> >But in any case, if we are talking about external sorting, then disk
> >time will be so totally dominant that the choice of algorithm is
> >practically irrelevant.
> >
> Horsefeathers. Jim Gray's sorting contest site:
> http://research.microsoft.com/barc/SortBenchmark/
>
> proves that the choice of algorithm can have a profound affect on
> performance.

Picklesmoke. I was referring to the algorithm used to perform the sort
stage, and not the algorithm used to perform the IO which has a dominant
effect on the overall sort time. I thought that should be clear from
context.

>After all, the amount of IO done is the most
> important of the things that you should be optimizing for in
> choosing an external sorting algorithm.

Replacement selection uses a terrible O(f(n)) algorithm. The only
reason it is a customary choice for external sorting is because the runs
are twice as long.

Using a classical merge sequence, you have half as many reads and writes
using replacement selection as with other methods. That saves ONE read
and ONE write pass, because of having half as many subfiles.

Suppose, for instance, that you have 64 subfiles. Using any classical
merge algorithm, they will have to be read and merged in a first pass,
giving 32, then again giving 16 then again giving 8 then again giving 4,
then again giving two and one final pass to create one file.

So, if replacement selection were applied, there would be 6 read/write
passes instead of seven in this problem set. After the creation of the
original subfiles, the algorithm I listed reads once and writes once and
is done.

So what about the argument for skipping around? Well, first of all the
OS is going to cache the reads to a large degree. And second of all, if
we read a single record with no buffering and wrote a single record for
each operation, then because we only have to read once, that is better
than skipping around 7 times for every read and write because of
physically reading and writing the files over and over.

But don't take my word for it. Try it yourself. It is laughably
trivial to implement it.

> Clearly, if we know or can assume the range of the data in question
> the theoretical minimum amount of IO is one pass through all of the
> data (otherwise, we are back in O(lg(n!)) land ). Equally clearly,
for
> HD's that one pass should involve as few seeks as possible.
>
> In fact, such a principle can be applied to _all_ forms of IO: HD,
> RAM, and CPU cache. The absolute best that any sort can
> possibly do is to make one pass through the data and deduce the
> proper ordering of the data during that one pass.
>
> It's usually also important that our algorithm be Stable, preferably
> Wholly Stable.
>
> Let's call such a sort Optimal External Sort (OES). Just how much
> faster would it be than current practice?
>
> The short answer is the difference between how long it currently
> takes to sort a file vs how long it would take to "cat" the contents
> of the same file to a RAM buffer (_without_ displaying it). IOW,
> there's SIGNIFICANT room for improvement over current
> standard practice in terms of sorting performance, particularly
> external sorting performance.
>
> Since sorting is a fundamental operation in many parts of a DBMS,
> this is a Big Deal.
>
> This discussion has gotten my creative juices flowing. I'll post
> some Straw Man algorithm sketches after I've done some more
> thought.
>
> Ron
>
> > -----Original Message-----
> > From: Dann Corbit <DCorbit(at)connx(dot)com>
> > Sent: Friday, September 23, 2005 2:21 PM
> > Subject: Re: [HACKERS] [PERFORM] Releasing memory during ...
> >
> >For the subfiles, load the top element of each subfile into a
priority
> >queue. Extract the min element and write it to disk. If the next
> >value is the same, then the queue does not need to be adjusted.
> >If the next value in the subfile changes, then adjust it.
> >
> >Then, when the lowest element in the priority queue changes, adjust
> >the queue.
> >
> >Keep doing that until the queue is empty.
> >
> >You can create all the subfiles in one pass over the data.
> >
> >You can read all the subfiles, merge them, and write them out in a
> >second pass (no matter how many of them there are).
> >
> The Gotcha with Priority Queues is that their performance depends
> entirely on implementation. In naive implementations either Enqueue()
> or Dequeue() takes O(n) time, which reduces sorting time to O(n^2).

For a (typically) tiny queue [e.g. if you are merging 100 files) the
algorithm is very unimportant. The sample implementation just uses an
insertion sort to order the heads of the queues.

> The best implementations I know of need O(lglgn) time for those
> operations, allowing sorting to be done in O(nlglgn) time.
> Unfortunately, there's a lot of data manipulation going on in the
> process and two IO passes are required to sort any given file.
> Priority Queues do not appear to be very "IO friendly".
>
> I know of no sorting performance benchmark contest winner based on
> Priority Queues.

That proves it then. On the other hand, why not benchmark it yourself?
It's trivial to implement. The sample code supplied in the link below
is intended only as something to demonstrate the basic principle, but it
is easy enough to make it robust.

> >Replacement selection is not a good idea any more, since obvious
> >better ideas should take over. Longer runs are of no value if you do
not
> >have to do multiple merge passes.
> >
> Judging from the literature and the contest winners, Replacement
> Selection is still a viable and important technique. Besides Priority
> Queues, what "obvious better ideas" have you heard of?
>
>
> >I have explained this general technique in the book "C Unleashed",
> >chapter 13.
> >
> >Sample code is available on the book's home page.
> >
> URL please?
http://users.powernet.co.uk/eton/unleashed/errata/896213nw.zip

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Pflug 2005-09-26 17:21:21 Re: roundoff problem in time datatype
Previous Message Bruce Momjian 2005-09-26 17:14:47 Re: On Logging