Re: finding medians

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Josh Burdick" <jburdick(at)gradient(dot)cis(dot)upenn(dot)edu>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: finding medians
Date: 2002-05-30 20:48:28
Message-ID: D90A5A6C612A39408103E6ECDD77B82906F460@voyager.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ACK! Sorting to find a median is criminal.

"Introduction to Algorithms" by Thomas H. Cormen, Charles E. Leiserson,
Ronald L. Rivest
ISBN: 0262031418
explains the better algorithm very well.

Here is a freely available C++ template (written by me) for a bunch of
statistics (everything *but* the selection problem):
ftp://cap.connx.com/pub/tournament_software/STATS.HPP

It uses this template for improved summation accuracy:
ftp://cap.connx.com/pub/tournament_software/Kahan.Hpp

Here is an outline for selection. I wrote it in C++, but a rewrite to C
is trivial:

// Quickselect: find Kth smallest of first N items in array A
// recursive routine finds Kth smallest in A[Low..High]
// Etype: must have copy constructor, oeprator=, and operator<
// Nonrecursive driver is omitted.

template < class Etype >
void
QuickSelect (Etype A[], int Low, int High, int k)
{
if (Low + Cutoff > High)
InsertionSort (&A[Low], High - Low + 1);
else
{
// Sort Low, Middle, High
int Middle = (Low + High) / 2;

if (A[Middle] < A[Low])
Swap (A[Low], A[Middle]);
if (A[High] < A[Low])
Swap (A[Low], A[High]);
if (A[High] < A[Middle])
Swap (A[Middle], A[High]);

// Place pivot at Position High-1
Etype Pivot = A[Middle];
Swap (A[Middle], A[High - 1]);

// Begin partitioning
int i, j;
for (i = Low, j = High - 1;;)
{
while (A[++i] < Pivot);
while (Pivot < A[--j]);
if (i < j)
Swap (A[i], A[j]);
else
break;
}

// Restore pivot
Swap (A[i], A[High - 1]);

// Recurse: only this part changes
if (k < i)
QuickSelect (A, Low, i - 1, k);
else if (k > i)
QuickSelect (A, i + 1, High, k);
}
}

template < class Etype >
void
QuickSelect (Etype A[], int N, int k)
{
QuickSelect (A, 0, N - 1, k - 1);
}

If you want to use this stuff to improve statistics with vacuum, be my
guest.

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2002-05-30 21:03:26 Re: finding medians
Previous Message Josh Berkus 2002-05-30 20:30:09 Re: finding medians