Re: No merge sort?

From: cbbrowne(at)cbbrowne(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: No merge sort?
Date: 2003-04-14 16:52:57
Message-ID: 20030414165257.D226D56FE4@cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ron Peacetree wrote:
> ""Dann Corbit"" <DCorbit(at)connx(dot)com> wrote in message
> news:D90A5A6C612A39408103E6ECDD77B829408ACB(at)voyager(dot)corporate(dot)connx(dot)co
> m...
> > > From: Ron Peacetree [mailto:rjpeace(at)earthlink(dot)net]=20
> > > Sent: Wednesday, April 09, 2003 12:38 PM
> > > You only need as many bins as you have unique key values=20
> > > silly ;-) Remember, at its core Radix sort is just a=20
> > > distribution counting sort (that's the name I learned for the=20
> > > general technique). The simplest implementation uses bits as=20
> > > the atomic unit, but there's nothing saying you have to...=20=20
> > > In a DB, we know all the values of the fields we currently=20
> > > have stored in the DB. We can take advantage of that.
> >
> > By what magic do we know this? If a database knew a-priori what all
> the
> > distinct values were, it would indeed be excellent magic.
> For any table already in the DB, this is self evident. If you are
> talking about sorting data =before= it gets put into the DB (say for a
> load), then things are different, and the best you can do is used
> comparison based methods (and probably some reasonably sophisticated
> external merging routines in addition if the data set to be sorted and
> loaded is big enough). The original question as I understood it was
> about a sort as part of a query. That means everything to be sorted
> is in the DB, and we can take advantage of what we know.
>
> > With 80 bytes, you have 2^320 possible values. There is no way
> > around that. If you are going to count them or use them as a
> > radix, you will have to classify them. The only way you will
> > know how many unique values you have in
> > "Company+Division" is to ...
> > Either sort them or by some means discover all that are distinct

> Ummm, Indexes, particularly Primary Key Indexes, anyone? Finding the
> unique values in an index should be perceived as trivial... ...and you
> often have the index in memory for other reasons already.

But this does not remove the fact that a radix sort on an 80 byte field
requires O(2^320) space, that is, O(N), where N is the size of the state
space of the key...

Perhaps you need to reread Gonnet; it tells you that...
--
output = reverse("moc.enworbbc@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of the Evil Overlord #28. "My pet monster will be kept in a
secure cage from which it cannot escape and into which I could not
accidentally stumble." <http://www.eviloverlord.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-04-14 17:00:55 Re: [GENERAL] Problem about pgsql's column alias
Previous Message John Gray 2003-04-14 16:52:34 Re: How do you execute a postgresql function from perl?