Skip site navigation (1) Skip section navigation (2)

Re: Merge algorithms for large numbers of "tapes"

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>,"Luke Lonergan" <llonergan(at)greenplum(dot)com>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>,<pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-08 22:09:40
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D5E8@postal.corporate.connx.com (view raw or flat)
Thread:
Lists: pgsql-hackers
There are some articles here that are worth reading if you want to sort
fast:

http://research.microsoft.com/barc/SortBenchmark/

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Dann Corbit
> Sent: Wednesday, March 08, 2006 1:59 PM
> To: Luke Lonergan; Tom Lane; Jim C. Nasby
> Cc: Simon Riggs; pgsql-hackers(at)postgreSQL(dot)org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
> 
> > -----Original Message-----
> > From: Luke Lonergan [mailto:llonergan(at)greenplum(dot)com]
> > Sent: Wednesday, March 08, 2006 1:52 PM
> > To: Dann Corbit; Tom Lane; Jim C. Nasby
> > Cc: Simon Riggs; pgsql-hackers(at)postgreSQL(dot)org
> > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
> >
> > Dann,
> >
> > On 3/8/06 12:39 PM, "Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
> >
> > > Here are some suggestions of things that I know work really,
really
> > > well:
> >
> > Can you point to an example?  That might help move the discussion
> along.
> 
> I wrote all of the sorting and merging stuff for CONNX Solutions
> http://www.connx.com
> 
> I have carefully benched all of this stuff and (at least for our
system)
> the ideas I propose work well.  Of course, every system is different
and
> the only way to know if it is an improvement is to try it in place.
> 
> > The reason to interject about the tape goo in this discussion is
that
> we
> > seem to be spending a lot of time optimizing around the tape goo
> without
> > tackling the overall structure of the external sort.  I think we'll
> just
> > end
> > up having to replace all of this goo when we really get around to
> fixing
> > the
> > problem.
> 
> I suggest trying several alternatives and benching them with real
world
> queries and especially with the open database benchmark suite.
> 
> > Add to this that other commercial databases external sort in 1/4 the
> time
> > or
> > better on the same hardware with the same CPU/memory resources using
a
> > 2-pass external sort.
> 
> Our sort merge is so fast that I can join two tables on a column with
no
> index faster than on a database that has a unique clustered index on
the
> column.  Benchmarked against Oracle, SQL*Server, and several others.
> 
> If you check our ORDER BY on a large table with no index, you will see
> that it is competitive with the best commercial systems.
> 
> If you are interested, you could get an eval of CONNX and try it
> yourself (eval is free for some number of days, I don't remember
what).
> 
>  > > #1.  Two pass merge (none of that silly poly-tape merge goo)
> >
> > Voice of reason here.  It's what the other database systems do.
> >
> > > #2.  Load ONLY the keys that are to be sorted into memory.  Use a
> > > pointer exchange sort, and do not move the physical rows of data
at
> all.
> >
> > Sounds right.  Example of this in practice?
> 
> It is what we use here.  It is the only way to fly.  This is well
known,
> and if you read a few articles from the ACM, you will see that it has
> been known for decades.
> 
> > > I am pretty sure from this thread that PostgreSQL is not doing #1,
> and I
> > > have no idea if it is doing #2.
> >
> > Yep.  Even Knuth says that the tape goo is only interesting from a
> > historical perspective and may not be relevant in an era of disk
> drives.
> >
> > - Luke
> >
> 
> 
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

pgsql-hackers by date

Next:From: Alvaro HerreraDate: 2006-03-08 22:23:53
Subject: Re: [COMMITTERS] pgsql: Remove Christof Petig copyright on include file, per author
Previous:From: Jonah H. HarrisDate: 2006-03-08 22:05:59
Subject: Re: Coverity Open Source Defect Scan of PostgreSQL

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group