Re: Sorting Improvements for 8.4

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-19 20:51:50
Message-ID: 47698466.4070609@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:
> On Wed, 2007-12-19 at 12:08 -0500, Mark Mielke wrote:
>
>> Stupid question #2: Is it well recognized that the CPU is the
>> bottleneck in the PostgreSQL sorting mechanism? Or might it be memory
>> bandwidth and I/O?
>>
> I think it depends a lot on several factors. It's probably a different
> bottleneck for integers versus localized text, and depends on the
> available memory and I/O characteristics.
>
Makes sense.

>> It would seem to me that any sort worth parallelizing (administrative
>> and synchronization overhead), must have data larger than the L2
>> cache. If larger than the L2 cache, it becomes real memory speed. If
>> real memory speed, wouldn't one CPU without hardware synchronization,
>> be able to fill the memory read/write pipe?
>>
> We do an external merge sort, which involves merging M runs together.
> You seem to be implying that we can generate the output run at disk
> speed, and therefore the CPU speed is unimportant.
>
Correct. Or, alternatively, you could achieve the same effect using
asychronous I/O or read ahead.
> I suspect that the comparison costs are enough that the above statement
> isn't true in all cases, particularly in the case of localized text.
That sounds possible, but I still feel myself suspecting that disk reads
will be much slower than localized text comparison. Perhaps I am
overestimating the performance of the comparison function?

>
> Also, there is probably a lot of memory copying going on, and that
> probably destroys a lot of the effectiveness of L2 caching. When L2
> caching is ineffective, the CPU spends a lot of time just waiting on
> memory. In that case, it's better to have P threads of execution all
> waiting on memory operations in parallel.
>
I didn't consider the high throughput / high latency effect. This could
be true if the CPU prefetch isn't effective enough.

> This would explain why "1p2t" would outperform a "1p1t" in Ron's
> reference above.
>
> These are just my first thoughts, however. There is a lot of existing
> research out there that we can look into, and also a lot of tests that
> we can run before jumping into this.
>
> I think parallel sorting can be looked into separately from the other
> sorting improvements.
>
Yep - I started to read up on it. It still sounds like it's a hard-ish
problem (to achieve near N times speedup for N CPU cores without
degrading performance for existing loads), but that doesn't mean
impossible. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2007-12-19 22:41:37 Re: Sorting Improvements for 8.4
Previous Message Mark Mielke 2007-12-19 20:45:39 Re: Sorting Improvements for 8.4