Skip site navigation (1) Skip section navigation (2)

Re: Sorting Improvements for 8.4

From: "Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl>
To: "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>,"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-27 14:42:57
Message-ID: E4953B65D9E5054AA6C227B410C56AA9C23E@exchange1.joris2k.local (view raw or flat)
Thread:
Lists: pgsql-hackers
>-----Original Message-----
>From: pgsql-hackers-owner(at)postgresql(dot)org 
>[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Ron Mayer
>Sent: Wednesday, 19 December 2007 19:26
>To: Mark Mielke; pgsql-hackers(at)postgresql(dot)org
>Subject: Re: [HACKERS] Sorting Improvements for 8.4
>
>> Or do you mean being able to perform parts of the query plan 
>fully in 
>> parallel? If this, then one would need a lot more than 
>ParallelSort...
>
>I wouldn't recommend that - it seems like a Hard Problem.
>
>My guess is that the best way to use multiple threads in one 
>backend would be to find specific algorithms like sorting that 
> would be easier to isolate.

To give my view on this problem: if I'm looking at a competing
(commercial) database product, they added some operations called
"parallize" and "combine". Basically they split the data across several
threads at one point and combine them later. This is basically what you
are also implementing for "parallelsort", but as a single step in the
query exeuction.

In my opinion your starting point is too narrow and specific, especially
since a fairly simple generalization is possible. Instead, the issue
becomes the spill-to-disk code that needs to operate in parallel (which
needs to be tackled sooner or later anyways).

If you can change the sort into three steps: parallelize, sort (multiple
parallel instances) and combine (merge) you still have the same base
case. However I believe such a thing is much easier to extend to more
operations.

Futhermore it seems that cache is a considered a major problem,
especially the varying sizes. Wouldn't a cache-oblivious algorithm, like
<http://erikdemaine.org/papers/BRICS2002/> or
<http://etd.uwaterloo.ca/etd/afarzan2004.pdf> be a good starting point
for refinements on sort algorithm itself?
I believe you can get a more consistent performance depending on the
cache sizes, but it might be slower than a well-tuned quicksort.

Just my EUR 0,02...

- Joris


In response to

pgsql-hackers by date

Next:From: Andreas 'ads' ScherbaumDate: 2007-12-27 15:19:09
Subject: Re: Binary data type with other output method
Previous:From: Andrew DunstanDate: 2007-12-27 13:52:15
Subject: Re: Binary data type with other output method

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group