Re: qsort again (was Re: [PERFORM] Strange Create Index

From: Jens-Wolfhard Schicke <ml+pgsql-hackers(at)asco(dot)de>
To: Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index
Date: 2006-02-17 08:18:39
Message-ID: 97E2D852E73713354532B531@[192.168.1.72]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

--On Donnerstag, Februar 16, 2006 10:39:45 -0800 Dann Corbit
<DCorbit(at)connx(dot)com> wrote:

> He refers to counting sort and radix sort (which comes in most
> significant digit and least significant digit format). These are also
> called distribution (as opposed to comparison) sorts.
>
> These sorts are O(n) as a function of the data size, but really they are
> O(M*n) where M is the average key length and n is the data set size.
> (In the case of MSD radix sort M is the average length to completely
> differentiate strings)
>
> So if you have an 80 character key, then 80*log(n) will only be faster
I suppose you meant 80 * n here?

> than n*log(n) when you have 2^80th elements -- in other words -- never.
I think this is wrong. You can easily improve Radix sort by a constant if
you don't take single bytes as the digits but rather k-byte values. At
least 2 byte should be possible without problems. This would give you 40 *
n time, not 80 * n. And you assume that the comparision of an 80-byte wide
value only costs 1, which might (and in many cases will be imho) wrong.
Actually it migh mean to compare 80 bytes as well, but I might be wrong.

What I think as the biggest problem is the digit representation necessary
for Radix-Sort in cases of locales which sort without looking at spaces. I
assume that would be hard to implement. The same goes for the proposed
mapping of string values onto 4/8-byte values.

Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke j(dot)schicke(at)asco(dot)de
asco GmbH http://www.asco.de
Mittelweg 7 Tel 0531/3906-127
38106 Braunschweig Fax 0531/3906-400

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Bird 2006-02-17 08:26:03 who is pgsql in cvs
Previous Message Luke Lonergan 2006-02-17 07:35:37 Re: In Japan with Josh Berkus

Browse pgsql-performance by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-02-17 09:17:49 Re: qsort again (was Re: [PERFORM] Strange Create Index
Previous Message martial.bizel 2006-02-17 08:12:39 split partitioned table across several postgres servers