Re: [HACKERS] A Better External Sort?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>, Luke Lonergan <llonergan(at)greenplum(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Ron Peacetree <rjpeace(at)earthlink(dot)net>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] A Better External Sort?
Date: 2005-10-01 16:17:17
Message-ID: 87r7b5ciuq.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > I think the largest speedup will be to dump the multiphase merge and
> > merge all tapes in one pass, no matter how large M. Currently M is
> > capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over
> > the tape. It could be done in a single pass heap merge with N*log(M)
> > comparisons, and, more importantly, far less input and output.
>
> I had more or less despaired of this thread yielding any usable ideas
> :-( but I think you have one here. The reason the current code uses a
> six-way merge is that Knuth's figure 70 (p. 273 of volume 3 first
> edition) shows that there's not much incremental gain from using more
> tapes ... if you are in the regime where number of runs is much greater
> than number of tape drives. But if you can stay in the regime where
> only one merge pass is needed, that is obviously a win.

Is that still true when the multiple tapes are being multiplexed onto a single
actual file on disk?

That brings up one of my pet features though. The ability to declare multiple
temporary areas on different spindles and then have them be used on a rotating
basis. So a sort could store each tape on a separate spindle and merge them
together at full sequential i/o speed.

This would make the tradeoff between multiway merges and many passes even
harder to find though. The broader the multiway merges the more sort areas
would be used which would increase the likelihood of another sort using the
same sort area and hurting i/o performance.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-10-01 16:19:41 Re: [HACKERS] A Better External Sort?
Previous Message Tom Lane 2005-10-01 16:12:09 Re: [PATCHES] Proposed patch for sequence-renaming problems

Browse pgsql-performance by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-10-01 16:19:41 Re: [HACKERS] A Better External Sort?
Previous Message Tom Lane 2005-10-01 15:44:15 Re: [HACKERS] A Better External Sort?