Re: Sorting Improvements for 8.4

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-19 17:08:23
Message-ID: 47695007.40208@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michał Zaborowski wrote:
> Ok - we want to sort table with quick sort and we want to do it on - N threads.
> Every thread - gets begin and end of indices of the table. First step starts
> at 0 and lasts with count -1. Single step: find medium value and move
> lover before it and bigger after. In normal case - we use recursive call - so
> the same procedure is being called for that two parts. In thread we can put
> indices at side list - and use queue of threads to pick up data from the list.
> We can use common table, access to side list with indices has to be serialized.
>
Stupid question #2: Is it well recognized that the CPU is the bottleneck
in the PostgreSQL sorting mechanism? Or might it be memory bandwidth and
I/O?

It would seem to me that any sort worth parallelizing (administrative
and synchronization overhead), must have data larger than the L2 cache.
If larger than the L2 cache, it becomes real memory speed. If real
memory speed, wouldn't one CPU without hardware synchronization, be able
to fill the memory read/write pipe? If 'divide and conquer' to
parallize, wouldn't the values written
from one thread, often (1 / N) need to be read from another thread,
requiring hardware data synchronization?

I see the wikipedia.org page describes how easy it is to parallelize
quick sort, and scale performance linearly with the number of
processors, but I don't see references to back this claim.
At least some of these steps seem difficult or impractical to
parallelize. For example, the initial partition reorder that moves items
lower than the pivot to the left, and items higher than the pivot to the
right, would not be easy to parallelize using an in-place re-order. It
needs to move one partition down before it can 'divide and conquer'.
They say no synchronization is required, but I think they are missing
the hardware synchronization required (especially in the inner most
loops where the thread task becomes shorter, and starts to fit in
L1/L2). They say linear, but then talk about a 'new thread being
created'. New thread creation has a cost, and if reduced to using a
thread pool, then synchronization *is* required.

It sounds like a 'nice in theory' idea. :-) Which doesn't mean it is
wrong...

I am curious enough to write a test...
>> Or do you mean being able to perform parts of the query plan fully in
>> parallel? If this, then one would need a lot more than ParallelSort..
> Nice to have, but rather for data warehouses. In other cases... IE - backend
> for Internet - there are many requests and every processor / core works nice.
>
I'm a fan of the 'each plan item is a task, that is assigned to the
pool, with each CPU grabbing tasks from the pool'. Another 'nice in
theory' idea (used by DB2?). As it is, though, I think PostgreSQL
planning is heavily designed to maximize performance on a single CPU,
and single queries would not easily scale to multiple CPUs. (Perhaps
hashing could be done on another CPU, or as you describe above, sorting)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2007-12-19 18:00:57 Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)
Previous Message Tom Lane 2007-12-19 16:45:12 Re: Testing mail list