Re: CUDA Sorting

From: Dann Corbit <DCorbit(at)connx(dot)com>
To: 'Gaetano Mendola' <mendola(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CUDA Sorting
Date: 2012-02-16 00:37:40
Message-ID: 87F42982BF2B434F831FCEF4C45FC33E505D48EE@EXCHANGE.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Gaetano Mendola
Sent: Wednesday, February 15, 2012 2:54 PM
To: Peter Geoghegan; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] CUDA Sorting

On 15/02/2012 23:11, Peter Geoghegan wrote:
> On 15 February 2012 20:00, Gaetano Mendola<mendola(at)gmail(dot)com> wrote:
>> On 13/02/2012 19:48, Greg Stark wrote:
>>>
>>> I don't think we should be looking at either CUDA or OpenCL directly.
>>> We should be looking for a generic library that can target either
>>> and is well maintained and actively developed. Any GPU code we write
>>> ourselves would rapidly be overtaken by changes in the hardware and
>>> innovations in parallel algorithms. If we find a library that
>>> provides a sorting api and adapt our code to use it then we'll get
>>> the benefits of any new hardware feature as the library adds support for them.
>>>
>>
>> I think one option is to make the sort function pluggable with a
>> shared library/dll. I see several benefits from this:
>>
>> - It could be in the interest of the hardware vendor to provide the
>> most powerful sort implementation (I'm sure for example that TBB sort
>> implementation is faster that pg_sort)
>>
>> - It can permit people to "play" with it without being deep
>> involved in pg development and stuffs.
>
> Sorry, but I find it really hard to believe that the non-availability
> of pluggable sorting is what's holding people back here. Some vanguard
> needs to go and prove the idea by building a rough prototype before we
> can even really comment on what an API should look like. For example,
> I am given to understand that GPUs generally sort using radix sort -
> resolving the impedance mismatch that prevents someone from using a
> non-comparison based sort sure sounds like a lot of work for an
> entirely speculative reward.

AFAIK thrust library uses the radix sort if the keys you are sorting are POD data comparable with a "<" operator otherwise it does the comparison based sort using the operator provided.

http://docs.thrust.googlecode.com/hg/modules.html

I'm not saying that the non-availability of pluggable sort completely holds people back, I'm saying that it will simplify the process now and int the future, of course that's my opinion.

> Someone who cannot understand tuplesort, which is not all that
> complicated, has no business trying to build GPU sorting into
> Postgres.

That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work we do GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: "my server is able to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice".

What I have to say? Sorry for my thoughts.

> I had a patch committed a few hours ago that almost included the
> capability of assigning an alternative sorting function, but only one
> with the exact same signature as my variant of qsort_arg. pg_qsort
> isn't used to sort tuples at all, by the way.

Then I did look in the wrong direction. Thank you for point that out.

> Threading building blocks is not going to form the basis of any novel
> sorting implementation, because comparators in general are not thread
> safe, and it isn't available on all the platforms we support, and
> because of how longjmp interacts with C++ stack unwinding and so on
> and so on. Now, you could introduce some kind of parallelism into
> sorting integers and floats, but that's an awful lot of work for a
> marginal reward.

The TBB was just example that did come in my mind.
What do you mean with you could introduce some kind of parallelism?
As far as I know any algorithm using the divide and conquer can be parallelized.
>>
Radix sorting can be used for any data type, if you create a callback that provides the most significant bits in "width" buckets. At any rate, I can't imagine why anyone would want to complain about sorting 40 times faster than before, considering the amount of time database spend in ordering data.

I have a Cuda card in this machine (NVIDIA GeForce GTX 460) and I would not mind it a bit if my database "ORDER BY" clause suddenly started running ten times faster than before when I am dealing with a huge volume of data.

There have been other experiments along these lines such as:
GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - Carnegie Mellon University
www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf (This is for SQLite, but the grammar of SQLite is almost a pure subset of PostgreSQL, including things like vacuum...)
http://wiki.postgresql.org/images/6/65/Pgopencl.pdf
http://dl.acm.org/citation.cfm?id=1807207
http://www.scribd.com/doc/51484335/PostgreSQL-OpenCL-Procedural-Language-pgEast-March-2011

See also
http://highscalability.com/scaling-postgresql-using-cuda

<<

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Han Zhou 2012-02-16 01:44:53 Re: Fwd: [HACKERS] client performance v.s. server statistics
Previous Message Peter Geoghegan 2012-02-16 00:30:13 Re: CUDA Sorting