Re: CUDA Sorting

From: Gaetano Mendola <mendola(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Subject: Re: CUDA Sorting
Date: 2012-02-12 01:14:03
Message-ID: 4F37125B.8080605@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19/09/2011 16:36, Greg Smith wrote:
> On 09/19/2011 10:12 AM, Greg Stark wrote:
>> With the GPU I'm curious to see how well
>> it handles multiple processes contending for resources, it might be a
>> flashy feature that gets lots of attention but might not really be
>> very useful in practice. But it would be very interesting to see.
>
> The main problem here is that the sort of hardware commonly used for
> production database servers doesn't have any serious enough GPU to
> support CUDA/OpenCL available. The very clear trend now is that all
> systems other than gaming ones ship with motherboard graphics chipsets
> more than powerful enough for any task but that. I just checked the 5
> most popular configurations of server I see my customers deploy
> PostgreSQL onto (a mix of Dell and HP units), and you don't get a
> serious GPU from any of them.
>
> Intel's next generation Ivy Bridge chipset, expected for the spring of
> 2012, is going to add support for OpenCL to the built-in motherboard
> GPU. We may eventually see that trickle into the server hardware side of
> things too.

The trend is to have server capable of running CUDA providing GPU via
external hardware (PCI Express interface with PCI Express switches),
look for example at PowerEdge C410x PCIe Expansion Chassis from DELL.

I did some experimenst timing the sort done with CUDA and the sort done
with pg_qsort:
CUDA pg_qsort
33Milion integers: ~ 900 ms, ~ 6000 ms
1Milion integers: ~ 21 ms, ~ 162 ms
100k integers: ~ 2 ms, ~ 13 ms

CUDA time has already in the copy operations (host->device, device->host).

As GPU I was using a C2050, and the CPU doing the pg_qsort was a
Intel(R) Xeon(R) CPU X5650 @ 2.67GHz

Copy operations and kernel runs (the sort for instance) can run in
parallel, so while you are sorting a batch of data, you can copy the
next batch in parallel.

As you can see the boost is not negligible.

Next Nvidia hardware (Keplero family) is PCI Express 3 ready, so expect
in the near future the "bottle neck" of the device->host->device copies
to have less impact.

I strongly believe there is space to provide modern database engine of
a way to offload sorts to GPU.

> I've never seen a PostgreSQL server capable of running CUDA, and I
> don't expect that to change.

That sounds like:

"I think there is a world market for maybe five computers."
- IBM Chairman Thomas Watson, 1943

Regards
Gaetano Mendola

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gaetano Mendola 2012-02-12 01:20:16 Re: CUDA Sorting
Previous Message Jeff Janes 2012-02-12 01:02:11 Re: some longer, larger pgbench tests with various performance-related patches