Re: Sorting Improvements for 8.4

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-19 18:18:33
Message-ID: 1198088313.28804.387.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2007-12-19 at 12:08 -0500, Mark Mielke wrote:
> Stupid question #2: Is it well recognized that the CPU is the
> bottleneck in the PostgreSQL sorting mechanism? Or might it be memory
> bandwidth and I/O?
>

I think it depends a lot on several factors. It's probably a different
bottleneck for integers versus localized text, and depends on the
available memory and I/O characteristics.

>
> It would seem to me that any sort worth parallelizing (administrative
> and synchronization overhead), must have data larger than the L2
> cache. If larger than the L2 cache, it becomes real memory speed. If
> real memory speed, wouldn't one CPU without hardware synchronization,
> be able to fill the memory read/write pipe?

We do an external merge sort, which involves merging M runs together.
You seem to be implying that we can generate the output run at disk
speed, and therefore the CPU speed is unimportant.

I suspect that the comparison costs are enough that the above statement
isn't true in all cases, particularly in the case of localized text.

Also, there is probably a lot of memory copying going on, and that
probably destroys a lot of the effectiveness of L2 caching. When L2
caching is ineffective, the CPU spends a lot of time just waiting on
memory. In that case, it's better to have P threads of execution all
waiting on memory operations in parallel.

This would explain why "1p2t" would outperform a "1p1t" in Ron's
reference above.

These are just my first thoughts, however. There is a lot of existing
research out there that we can look into, and also a lot of tests that
we can run before jumping into this.

I think parallel sorting can be looked into separately from the other
sorting improvements.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ron Mayer 2007-12-19 18:26:17 Re: Sorting Improvements for 8.4
Previous Message Gokulakannan Somasundaram 2007-12-19 18:00:57 Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)