Re: Compression and on-disk sorting

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-17 21:51:56
Message-ID: 20060517215155.GD42612@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, May 17, 2006 at 05:44:22PM -0400, Greg Stark wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
>
> > On Wed, May 17, 2006 at 12:55:53PM -0400, Greg Stark wrote:
> > >
> > > "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > >
> > > > Only if those spindles weren't all in a single RAID array and if we went
> > > > through the trouble of creating all the machinery so you could tell
> > > > PostgreSQL where all those spindles were mounted in the filesystem.
> > >
> > > I think the way you do this is simply by documenting that the admin should
> > > create precisely one temp area per physical spindle (or raid array).
> >
> > And you still need some way to tell PostgreSQL about all of that.
>
> No, my point was that you tell Postges how many spindles you have and where to
> find them by creating precisely one temp area on each spindle. It then knows

Which means we need all the interface bits to be able to tell PostgreSQL
where every single temp storage area is. Presumably much of the
tablespace mechanism could be used for this, but it's still a bunch of
work. And you can't just say "I have 8 spindles", you have to tell
PostgreSQL exactly where to put each temporary area (unless you just
have it put one on every tablespace you have defined).

> that it should strive to maximize sequential reads within one temp area and
> expect switching between temp areas (which represent multiple spindles) to be
> better than multiplexing multiple tapes within a single temp area (which
> represents a single spindle).

Which adds yet more complexity to all the code that uses the temp area.
And as others have brought up, you still have to allow for the case when
splitting all of this out into multiple files means you end up using
substantially more disk space. That further drives up the complexity.

My point is that unless someone shows that there's a non-trivial
performance gain here, it's not going to happen.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-05-17 22:25:21 Re: PL/pgSQL 'i = i + 1' Syntax
Previous Message Jonah H. Harris 2006-05-17 21:51:53 Re: PL/pgSQL 'i = i + 1' Syntax

Browse pgsql-patches by date

  From Date Subject
Next Message Joachim Wieland 2006-05-18 00:01:41 small doc patch for regexp_replace
Previous Message Bruce Momjian 2006-05-17 21:50:59 Re: [HACKERS] .pgpass file and unix domain sockets