Re: Merge algorithms for large numbers of "tapes"

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-08 01:28:17
Message-ID: 36e682920603071728rdb6fe79m81258143e1cf01d1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/7/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> BTW, I was just looking over Knuth's discussion of sorting again, and
> realized that there is still something more that could be done within
> the existing sort framework. We currently use standard polyphase merge
> (his Algorithm 5.4.2D), which IIRC I chose because it was simple and
> for relatively small numbers of tapes T it was about as good as anything
> else. Knuth spends a great deal of energy on minimizing tape rewind
> time which of course is of no interest to us, and I had supposed that
> all of his more-complex algorithms were really only of interest if you
> needed to consider rewind time. However, now that we've changed the
> code to prefer large numbers of tapes, it's not at all clear that
> Algorithm D is still the right one to use. In particular I'm looking at
> cascade merge, Algorithm 5.4.3C, which appears to use significantly
> fewer passes when T is large. Do you want to try that?

I haven't personally played with this algorithm but having spent the last 15
minutes reading it over, it does sound like an interesting idea for trial.
At first glance it didn't seem much better than polyphase for our case, but
after reading the entire algorithm, discussion, and thinking it over for a
couple minutes, I could see it as potentially better.

Guess we won't really know 'til it can be tested :)

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-03-08 03:25:06 Re: Problemas with gram.y
Previous Message Alvaro Herrera 2006-03-07 23:49:24 Re: pg_freespacemap question