Re: Sorting Improvements for 8.4

From: Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>
To: "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-19 10:31:56
Message-ID: e2289d9e0712190231u6d1cd5e0qe57643c99492e4a5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2007/12/19, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>:
> Jeff Davis wrote:
> > My first thought would be that we would need a new executor node (e.g.
> > "ParallelSort") that would only be chosen when the cost of the sort is
> > large enough to outweigh other factors (such as process creation time,
> > dividing available work_mem, and any necessary IPC).
> >
> > It seems to me the simplest way to do it would be to allow each sub
> > process to allocate work_mem/P where P is the degree of parallelization.
> > However, that somewhat works against our schemes for dynamic run
> > handling and forecasting, and may lead to more random reading from disk.
> > Any other scheme I can think of would involve more IPC, which might make
> > the idea just too complex.
> >
> I am curious - what algorithms exist to efficiently do a parallel sort?
> Do you mean if sorting 1 million items, it is possible to separate this
> into 2 sets of 500 thousand each, execute them in separate threads
> (with task administration and synchronization overhead) , combine the
> results, and complete the task in significantly less time than doing it
> in one thread? I am skeptical that this is possible, and suspect that
> the overall efficiency of the system would go down even if the
> throughput of a single execution increases.
>
Ok - we want to sort table with quick sort and we want to do it on - N threads.
Every thread - gets begin and end of indices of the table. First step starts
at 0 and lasts with count -1. Single step: find medium value and move
lover before it and bigger after. In normal case - we use recursive call - so
the same procedure is being called for that two parts. In thread we can put
indices at side list - and use queue of threads to pick up data from the list.
We can use common table, access to side list with indices has to be serialized.

> Or do you mean being able to perform parts of the query plan fully in
> parallel? If this, then one would need a lot more than ParallelSort...
>
Nice to have, but rather for data warehouses. In other cases... IE - backend
for Internet - there are many requests and every processor / core works nice.

--
Regards,
Michał Zaborowski (TeXXaS)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-12-19 11:25:36 Testing mail list
Previous Message Mark Mielke 2007-12-19 08:22:22 Re: Sorting Improvements for 8.4