Re: Merge algorithms for large numbers of "tapes"

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Luke Lonergan <llonergan(at)greenplum(dot)com>, Dann Corbit <DCorbit(at)connx(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-09 02:08:46
Message-ID: 20060309020846.GO45250@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 08, 2006 at 06:55:59PM -0500, Greg Stark wrote:
>
> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>
> > > I am pretty sure from this thread that PostgreSQL is not doing #1, and I
> > > have no idea if it is doing #2.
> >
> > Yep. Even Knuth says that the tape goo is only interesting from a
> > historical perspective and may not be relevant in an era of disk drives.
>
> As the size of the data grows larger the behaviour of hard drives looks more
> and more like tapes. The biggest factor controlling the speed of i/o
> operations is how many seeks are required to complete them. Effectively
> "rewinds" are still the problem it's just that the cost of rewinds becomes
> constant regardless of how long the "tape" is.

But it will take a whole lot of those rewinds to equal the amount of
time required by an additional pass through the data. I'll venture a
guess that as long as you've got enough memory to still read chunks back
in 8k blocks that it won't be possible for a multi-pass sort to
out-perform a one-pass sort. Especially if you also had the ability to
do pre-fetching (not something to fuss with now, but certainly a
possibility in the future).

In any case, what we really need is at least good models backed by good
drive performance data. And we really should have that anyway so that we
can improve upon our cost estimator functions. I'm betting that what
that will show us is that no single sort method is going to work best
for all cases. IE: I'd bet that if your data set is sufficiently larger
than available memory that you'll actually be better off with a
multi-pass approach over a single/two pass approach.

> That's one thing that gives me pause about the current approach of using more
> tapes. It seems like ideally the user would create a temporary work space on
> each spindle and the database would arrange to use no more than that number of
> tapes. Then each merge operation would involve only sequential access for both
> reads and writes.

For that to be of any use, wouldn't you need to use only as many tapes
as spindles/2? Otherwise you're still trying to read and write from the
same set of drives, which means you're probably doing a lot of seeking.
Or do the tape algorithms re-write data as they read it?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-03-09 02:16:12 Re: [PATCHES] Add switches for DELIMITER and NULL in pg_dump COPY
Previous Message Dann Corbit 2006-03-09 02:05:01 Re: Merge algorithms for large numbers of "tapes"