Re: Merge algorithms for large numbers of "tapes"

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-08 21:59:07
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D5E7@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Luke Lonergan [mailto:llonergan(at)greenplum(dot)com]
> Sent: Wednesday, March 08, 2006 1:52 PM
> To: Dann Corbit; Tom Lane; Jim C. Nasby
> Cc: Simon Riggs; pgsql-hackers(at)postgreSQL(dot)org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
> Dann,
>
> On 3/8/06 12:39 PM, "Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
>
> > Here are some suggestions of things that I know work really, really
> > well:
>
> Can you point to an example? That might help move the discussion
along.

I wrote all of the sorting and merging stuff for CONNX Solutions
http://www.connx.com

I have carefully benched all of this stuff and (at least for our system)
the ideas I propose work well. Of course, every system is different and
the only way to know if it is an improvement is to try it in place.

> The reason to interject about the tape goo in this discussion is that
we
> seem to be spending a lot of time optimizing around the tape goo
without
> tackling the overall structure of the external sort. I think we'll
just
> end
> up having to replace all of this goo when we really get around to
fixing
> the
> problem.

I suggest trying several alternatives and benching them with real world
queries and especially with the open database benchmark suite.

> Add to this that other commercial databases external sort in 1/4 the
time
> or
> better on the same hardware with the same CPU/memory resources using a
> 2-pass external sort.

Our sort merge is so fast that I can join two tables on a column with no
index faster than on a database that has a unique clustered index on the
column. Benchmarked against Oracle, SQL*Server, and several others.

If you check our ORDER BY on a large table with no index, you will see
that it is competitive with the best commercial systems.

If you are interested, you could get an eval of CONNX and try it
yourself (eval is free for some number of days, I don't remember what).

> > #1. Two pass merge (none of that silly poly-tape merge goo)
>
> Voice of reason here. It's what the other database systems do.
>
> > #2. Load ONLY the keys that are to be sorted into memory. Use a
> > pointer exchange sort, and do not move the physical rows of data at
all.
>
> Sounds right. Example of this in practice?

It is what we use here. It is the only way to fly. This is well known,
and if you read a few articles from the ACM, you will see that it has
been known for decades.

> > I am pretty sure from this thread that PostgreSQL is not doing #1,
and I
> > have no idea if it is doing #2.
>
> Yep. Even Knuth says that the tape goo is only interesting from a
> historical perspective and may not be relevant in an era of disk
drives.
>
> - Luke
>

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-03-08 22:00:27 pgsql: Remove Christof Petig copyright on include file, per author
Previous Message Neil Conway 2006-03-08 21:57:52 Re: [PATCHES] Add switches for DELIMITER and NULL in pg_dump COPY