Re: Sorting Improvements for 8.4

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2008-03-27 20:38:02
Message-ID: 200803272038.m2RKc2w09518@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO:

> * Consider being smarter about memory and external files used during
> sorts
>
> http://archives.postgresql.org/pgsql-hackers/2007-11/msg01101.php
> http://archives.postgresql.org/pgsql-hackers/2007-12/msg00045.php

---------------------------------------------------------------------------

Simon Riggs wrote:
> Just wanted to review a few thoughts and ideas around improving external
> sorts, as recently encouraged to do by Jim Nasby.
>
> Current issues/opportunities are these:
>
> ISSUES
>
> a) Memory is always in short supply, so using what we have more
> effectively is going to be welcome.
>
> b) Heap sort has a reasonably strong anti-memory effect, meaning that
> there is an optimum amount of memory for any sort. This shows itself
> with the CPU time increasing during run forming, making this stage of
> the sort CPU bound.
>
> c) Many sorts are performed prior to aggregation. It might be possible
> to aggregate prior to writing to disk, as a way of reducing the overall
> I/O cost. Benefit would occur when the total CPU cost was same no matter
> when aggregation occurred; that would not apply in all cases, so we
> would need to sense when benefit was possible.
>
> d) Generally reducing the I/O cost of sorting may help the merging
> stages of a sort.
>
>
> SOLUTIONS
>
> The ideas that Greg Stark, Jim Nasby, Heikki and myself have discussed
> to date were the following:
>
> 1. Sort I/O Compression
> 2. Aggregation during Sort
> 3. Memory Pools
> 4. Dynamic Heap Management
> 5. Dynamic Run Handling
>
> I've added (5) to the list as well, which hasn't yet been discussed.
>
> 1. SORT I/O COMPRESSION
>
> This idea is not dead yet, it just needs a full set of tests to confirm
> that there is benefit in all cases. If there's not benefit in all cases,
> we may be able to work out which cases those are, so we know when to use
> it.
>
>
> 2. AGGREGATION DURING SORT
>
> Many sorts are preliminary steps before aggregation. Aggregation during
> run forming would potentially reduce size of heap and reduce number of
> comparisons. For many types of aggregate this would not theoretically
> increase the number of ops since sum(), avg(), min(), max() are all
> commutative according to their inputs. We would probably need to add
> another option to Aggregate Functions to indicate the possibility of
> calculating the aggregate in this way, since some aggregates might rely
> on the current situation that they expect all their inputs at once in
> sorted order. (Windowed aggregates are unlikely to be this way).
>
>
> 3. MEMORY POOLS
>
> Solving a) could be done by sensible management and allocation of
> resources. Discussed before, so not rehashed here.
>
>
> 4. DYNAMIC HEAP MANAGEMENT
>
> The size of the active heap required to produce the fewest number of
> runs varies as the sort progresses. For example, sorting an already
> sorted input needs a trivial heap size.
>
> Larger heap sizes simply avoid forming more runs, which is not
> necessarily a bad thing. More runs only become bad things when we go
> beyond our ability to perform a single final merge (see Dynamic Run
> Handling below).
>
> Smaller heap sizes reduce the number of comparisons required, plus
> increase the L2+ cache efficiencies. Those two things are the cause of
> the anti-memory effect.
>
> Because of b), optimising the size of the heap could potentially be a
> good thing. This can make a considerable difference for nearly sorted
> data (measurements required...).
>
> When we have M amount of memory available to us, we don't start by using
> it all. We start with m memory and only increase up to M if required.
> Runs are built with memory set at m. If a tuple arrives that would force
> the formation of a new run we assess
>
> i) do we care if another run is formed? Use our knowledge of the likely
> amount of data coming our way, compared with number of runs formed so
> far and see if we really care. If we don't care, allow the new run to be
> formed and carry on with just heap size of m. (see Dynamic Run Handling
> later).
>
> ii) if we do care about number of runs, then allow the heap to grow by
> increments up to the full size of M. Increments would be at least x2 and
> possibly x4. That way we always have work space to rearrange the heap.
>
> All of this dances too cleverly around the exact technique and potential
> costs of rearranging the heap. That is not to be ignored and is the next
> task in evaluating and accepting/dismissing this potential technique.
>
> In combination with memory pooling this technique might also allow
> memory to be better distributed to other users.
>
>
> 5. DYNAMIC RUN HANDLING (in Final Merge)
>
> Another way of addressing a) is to simply make better use of memory
> itself. Let's look at that in more detail:
>
> Number of runs that can be merged at once is currently fixed, based upon
> available memory. This has the underlying assumption that all runs will
> be concurrently active during final merging, which may not always be
> true.
>
> If we have random data then almost all runs will overlap with all other
> runs, i.e. the min and max values are sufficiently wide that the runs do
> all overlap. In many cases, data arrives in somewhat sorted order, e.g.
> financial data is fairly regular with some late payers but not many, and
> those trail off with a fairly tight decay. In the somewhat sorted case
> we find that the actual overlap is less than total, so there are many
> later runs that don't overlap the earlier ones. In the best case we
> would find run 1 and 2 overlap, runs 2 and 3 overlap, then 3 and 4
> overlap.
>
> This is also the point where I suggest breaking away from Knuth
> completely. All of the main algorithms described by Knuth are tape
> sorts. A run is written to a particular tape and then stays there until
> "moved" to another tape. That means we have to get super-clever about
> how runs should be written and formed (see Knuth). If we realise that
> the runs aren't fixed to particular tapes they are all just independent
> runs, we can radically rethink sorting. There is no need to implement
> Cascade Sort, but we do need to rethink merging from the ground up. (All
> of which is a relief, because Knuth et al are definitely smarter than
> me, but I've got disks and lots of memory and those guys had tapes.).
>
> If we track the min and max values for each run, when run building is
> finished we will be able to build a merging plan that allows us to be
> smart about the runs we should bring together. We start with the run
> with the lowest min value, as well as all runs that overlap that run.
> When that run is exhausted we move to the next lowest and at that point
> start merging all runs that overlap that one.
>
> This then means we may be able to begin final merging with more runs
> than the current cut-off. It's possible that we could merge an infinite
> number of runs in final merge with fixed memory. If we *do* need to
> merge we can work out which runs should be our best pre-merge
> candidates, based upon how big they are and which other runs they
> overlap. (That's much better than being forced to merge tapes 2, 7 and
> 17 because some bizarre math says so (see Knuth).)
>
> Anyway, claiming to have found a better way than Knuth makes me feel a
> little nervous, so some searching questions on this are very welcome.
>
> Interestingly, if we combine this technique with dynamic heap management
> we may be able to allow a very large number of efficiently written runs
> to form without it causing any merging.
>
> mac_man recently noted the possibility that some runs don't overlap at
> all and so can be merged for free. That's true, though doesn't actually
> improve the basic idea here which is building a merge plan after runs
> have been formed, with an eye on minimizing and potentially elimination
> the merge phase.
>
> There's probably some typos or thinkos above, so go easy on me Greg!
> They aren't there because I want to skim over anything.
>
> I'm not likely to get a chance to do all of this in the near future, so
> documenting it now should help others to carry things forward.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2008-03-27 20:56:45 Re: [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search
Previous Message Tom Lane 2008-03-27 20:27:34 Re: TODO Item: Consider allowing control of upper/lower case folding of unquoted, identifiers