Re: Merge algorithms for large numbers of "tapes"

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: gsstark(at)mit(dot)edu, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-09 01:43:39
Message-ID: 36e682920603081743h463bb808y52eac97a2e1aff92@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

An interesting read at

http://www.vldb.org/conf/1997/P376.PDF

On 3/8/06, Dann Corbit <DCorbit(at)connx(dot)com> wrote:
>
> > -----Original Message-----
> > From: gsstark(at)mit(dot)edu [mailto:gsstark(at)mit(dot)edu]
> > Sent: Wednesday, March 08, 2006 3:56 PM
> > To: Luke Lonergan
> > Cc: Dann Corbit; Tom Lane; Jim C. Nasby; Simon Riggs; pgsql-
> > hackers(at)postgresql(dot)org
> > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
> >
> >
> > "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
> >
> > > > I am pretty sure from this thread that PostgreSQL is not doing #1,
> and
> > I
> > > > have no idea if it is doing #2.
> > >
> > > Yep. Even Knuth says that the tape goo is only interesting from a
> > > historical perspective and may not be relevant in an era of disk
> drives.
> >
> > As the size of the data grows larger the behaviour of hard drives
> looks
> > more
> > and more like tapes. The biggest factor controlling the speed of i/o
> > operations is how many seeks are required to complete them.
> Effectively
> > "rewinds" are still the problem it's just that the cost of rewinds
> becomes
> > constant regardless of how long the "tape" is.
> >
> > That's one thing that gives me pause about the current approach of
> using
> > more
> > tapes. It seems like ideally the user would create a temporary work
> space
> > on
> > each spindle and the database would arrange to use no more than that
> > number of
> > tapes. Then each merge operation would involve only sequential access
> for
> > both
> > reads and writes.
>
> If the chief concern is in the number of subfiles created, replacement
> selection doubles the length of the subfiles while consuming no more
> memory.
> {The big-O of the algorithm sucks, though}
>
> It is certainly worth testing several cases.
>
> It is not a bad idea to enable more than one method of performing an
> operation.
>
> In the ideal case, you would have specific information about drives,
> spindles, rates for seek, transfer, etc.
>
> It all depends on how much effort you want to throw at it.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-03-09 01:44:08 Re: Merge algorithms for large numbers of "tapes"
Previous Message Jim C. Nasby 2006-03-09 01:31:39 Re: Merge algorithms for large numbers of "tapes"