Re: Replacement Selection

From: <mac_man2005(at)hotmail(dot)it>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-11-27 08:25:35
Message-ID: BAY132-DS30840B8472DA2670B81C2E6760@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi to all.

It seems a previous mail of mine with following body hasn't been sent.
Sorry for possibly getting it twice.

Actually I have now modified that body, so it's worth to read it once again.

Thanks for your attention.
Regards.

------------PREVIOUS MAIL--------------------------
Well, the refinements are the followings:

Using 2 heaps instead of just one:
one heap creating a "descending" run and the
other one creating an "ascending" run.
Both associated to the same "logical" run.

Suppose we want the input elements to be finally sorted in an ascending
order. To do this we could QuickSort the first M initialization elements
into RAM
and then divide it into 2 parts.
Suppose the first heap creates the following run:
10
9
8

And suppose the second heap creates the following run:
3
5
7

Those two runs can be seen as just one by mergesort... since they "could" be
physically merged into one single run: at first we could write the elements
3,5,7 and then the elements of the other run, red upside down.

Possible advantages:
Having two heaps of that kinds lets RS better adapt to local variations of
the input trend.
This technique can be called Two Ways Replacement Selection (2WRS) just
because of those 2 heaps.
As an extreme example, we can say that having the input already sort in
reverse order
no more leads us to the worst case: with 2WRS no matter the input is already
sort
in ascending/descending order... in this case we'll produce just one run
instead
of producing the maximum number of runs as in RS worst case (input in
reverse order).
Moreover it lets us to grow the current run in 2 ways: just imagine we would
output runs
in a regular file. With 2WRS this could be seen as start outputting elements
from the middle
of such a regular file, the descending heap outputting elements from the
middle upwards
while the ascending one outputting from the middle downward. This could
imply getting
a smaller number of "dead records" (as I said in previous mails, a dear
record is an element
that won't form part of the current run) and so having longer runs.

Others optimizations, for example, can be done with the "virtual
concatenation" technique:
storing a cache of couples (first_element,last_element) for each created
run. This
could be useful in case we can find 2 couples (first_element_1,
last_element_1) and
(first_element_2, last_element_2) with last_element_1 <= first_element_2.
In this case, those runs too can be seen as belonging to the same "logical
run"
(actually they are 2 RS different physical runs, or even 4 in 2WRS
but can be seen as just one by mergesort). Of course, once those 2 (or 4)
runs are
logically merged into that only one, this last one in turn could be merged
to other runs.

What does all that imply? Mergesort would actually consider a smaller number
of runs
(since it should just work on logical runs). This means less jumps between
runs on disk.

Now... to test those refinements I should integrate my code into
PostgreSQL... but it's not that easy for me...

Thanks for your attention.
------------PREVIOUS MAIL--------------------------

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2007-11-27 09:46:58 Re: maintenance_work_mem memory constraint?
Previous Message Guillaume Smet 2007-11-27 08:08:56 Re: [PATCHES] Proposed patch for operator lookup caching