Re: Compression and on-disk sorting

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-18 04:24:54
Message-ID: 87r72sot7t.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:

> Which means we need all the interface bits to be able to tell PostgreSQL
> where every single temp storage area is. Presumably much of the
> tablespace mechanism could be used for this, but it's still a bunch of
> work. And you can't just say "I have 8 spindles", you have to tell
> PostgreSQL exactly where to put each temporary area (unless you just
> have it put one on every tablespace you have defined).

Yes, if you have more than one temporary area you definitely need a way to
tell Postgres where to put them since obviously they won't be in the postgres
base directory. But I think that's all Postgres really needs to know.

One could imagine a more complex version where Postgres has meta information
about the bandwidth and seek penalty for each sort area separately. Presumably
also for each table space. But that's a whole lot more complexity than
Postgres's current cost model.

> > that it should strive to maximize sequential reads within one temp area and
> > expect switching between temp areas (which represent multiple spindles) to be
> > better than multiplexing multiple tapes within a single temp area (which
> > represents a single spindle).
>
> Which adds yet more complexity to all the code that uses the temp area.
> And as others have brought up, you still have to allow for the case when
> splitting all of this out into multiple files means you end up using
> substantially more disk space. That further drives up the complexity.

You also have to consider that it won't always be a benefit to spread the sort
over multiple sort areas. If there's only one sort going on and you can reach
a 1:1 ratio between tapes and spindles then I think it would be a huge
benefit. Effectively boosting the sort speed by random_page_cost.

But if you don't have as many spindles as your algorithm needs tapes
then it's unclear which to multiplex down and whether you gain any benefit
once you're multiplexing over simply using a single sort area.

And worse, if there are multiple sorts going on in the system then you're not
going to get sequential access even if you have multiple sort areas available.
If you have N sort areas and N sorts are going on then you're probably better
off multiplexing each one down to a single sort area and letting them each
proceed without interfering with each other rather than having each one hog
all the sort areas and forcing the OS to do the multiplexing blindly.

> My point is that unless someone shows that there's a non-trivial
> performance gain here, it's not going to happen.

I think two extreme cases are well worth pursuing:

1) Use n sort areas for n tapes making everything purely sequential access.
That would be useful for large DSS systems where large sorts are running
and i/o bandwidth is high for sequential access. That gives effectively a
random_page_cost speed boost.

2) Use the current algorithm unchanged but have each sort use a different sort
area in some sort of round-robin fashion. That helps the OLTP type
environment (ignoring for the moment that OLTP environments really ought
not be doing disk sorts) where people complain about unreliable execution
times more than slow execution times. If you can provide enough sort areas
then it would remove one big reason other queries concurrent impact the
execution time of queries.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Frost 2006-05-18 04:39:40 Re: does wal archiving block the current client connection?
Previous Message Jaime Casanova 2006-05-18 04:13:00 Re: PL/pgSQL 'i = i + 1' Syntax

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-05-18 05:45:03 buildfarm failures
Previous Message Bruce Momjian 2006-05-18 03:20:39 Re: small doc patch for regexp_replace