Re: Merge algorithms for large numbers of "tapes"

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-09 17:03:55
Message-ID: C0359FFB.1ED96%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim,

On 3/9/06 8:35 AM, "Jim C. Nasby" <jnasby(at)pervasive(dot)com> wrote:

> Well, the reality remains though; most folks are unlikely to setup
> enough dedicated temp areas so that we can do one tape per disk, so it
> would be really good to have a sort method that didn't rely on that.

Agreed - however optimizing the run output and merge pass is straightforward
without knowing the underlying I/O infrastructure.

Consider that a popular commercial database, running on a 6-disk RAID5 with
one filesystem, performs external sorting 4 times faster (1/4 of the time)
than Postgres using a two pass sort. There is no special optimization of
the I/O path involved, it's simply a matter of using a modern external
sorting approach (no tapes).

Tom's point about finite memory is definitely important - it does take
roughly SQRT(sort set) of memory to perform the two pass sort, but that is a
completely manageable amount of memory. The problem we have now is that we
don't use a dynamic memory allocation mechanism to provide this amount of
RAM to the task. That's why the tape algorithm is "safe", because you can
guarantee an external sort result, even with tiny memory.

But I believe the right answer is to implement the modern sorting algorithm
and the memory allocation to support it. Sorting is too important to most
operations to be so far behind - 400% slower is not acceptable, and I don't
think tweaking the current approach will get us there.

- Luke

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message William ZHANG 2006-03-09 17:05:28 Re: Proposal for SYNONYMS
Previous Message Stefan Kaltenbrunner 2006-03-09 16:36:32 Re: problem with large maintenance_work_mem settings and