Hi to all.
It seems a previous mail of mine with following body hasn't been sent.
Sorry for possibly getting it twice.
Actually I have now modified that body, so it's worth to read it once again.
Thanks for your attention.
Well, the refinements are the followings:
Using 2 heaps instead of just one:
one heap creating a "descending" run and the
other one creating an "ascending" run.
Both associated to the same "logical" run.
Suppose we want the input elements to be finally sorted in an ascending
order. To do this we could QuickSort the first M initialization elements
and then divide it into 2 parts.
Suppose the first heap creates the following run:
And suppose the second heap creates the following run:
Those two runs can be seen as just one by mergesort... since they "could" be
physically merged into one single run: at first we could write the elements
3,5,7 and then the elements of the other run, red upside down.
Having two heaps of that kinds lets RS better adapt to local variations of
the input trend.
This technique can be called Two Ways Replacement Selection (2WRS) just
because of those 2 heaps.
As an extreme example, we can say that having the input already sort in
no more leads us to the worst case: with 2WRS no matter the input is already
in ascending/descending order... in this case we'll produce just one run
of producing the maximum number of runs as in RS worst case (input in
Moreover it lets us to grow the current run in 2 ways: just imagine we would
in a regular file. With 2WRS this could be seen as start outputting elements
from the middle
of such a regular file, the descending heap outputting elements from the
while the ascending one outputting from the middle downward. This could
a smaller number of "dead records" (as I said in previous mails, a dear
record is an element
that won't form part of the current run) and so having longer runs.
Others optimizations, for example, can be done with the "virtual
storing a cache of couples (first_element,last_element) for each created
could be useful in case we can find 2 couples (first_element_1,
(first_element_2, last_element_2) with last_element_1 <= first_element_2.
In this case, those runs too can be seen as belonging to the same "logical
(actually they are 2 RS different physical runs, or even 4 in 2WRS
but can be seen as just one by mergesort). Of course, once those 2 (or 4)
logically merged into that only one, this last one in turn could be merged
to other runs.
What does all that imply? Mergesort would actually consider a smaller number
(since it should just work on logical runs). This means less jumps between
runs on disk.
Now... to test those refinements I should integrate my code into
PostgreSQL... but it's not that easy for me...
Thanks for your attention.
pgsql-hackers by date
|Next:||From: Bernd Helmle||Date: 2007-11-27 09:46:58|
|Subject: Re: maintenance_work_mem memory constraint? |
|Previous:||From: Guillaume Smet||Date: 2007-11-27 08:08:56|
|Subject: Re: [PATCHES] Proposed patch for operator lookup caching|