Re: CUDA Sorting

From: Greg Stark <stark(at)mit(dot)edu>
To: Vitor Reus <vitor(dot)reus(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CUDA Sorting
Date: 2011-09-19 14:12:41
Message-ID: CAM-w4HOR1CTsn1YJYnP3YC_vDg7nduCs6oXnmNCf9R-_+SM0tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 19, 2011 at 1:11 PM, Vitor Reus <vitor(dot)reus(at)gmail(dot)com> wrote:
> Since I'm new to pgsql development, I replaced the code of pgsql
> qsort_arg to get used with the way postgres does the sort. The problem
> is that I can't use the qsort_arg_comparator comparator function on
> GPU, I need to implement my own. I didn't find out how to access the
> sorting key value data of the tuples on the Tuplesortstate or
> SortTuple structures. This part looks complicated because it seems the
> state holds the pointer for the scanner(?), but I didn't managed to
> access the values directly. Can anyone tell me how this works?
>

This is something I've been curious about for a while. The biggest
difficulty is that Postgres has a user-extensible type system and
calls user provided functions to do things like comparisons. Postgres
only supports comparison sorts and does so by calling the user
function for the data type being sorted.

These user defined function is looked up earlier in the query parsing
and analysis phase and stored in Tuplesortstate->scanKeys which is an
array of structures that hold information about the ordering required.
In there there's a pointer to the function, a set of flags (such as
NULLS FIRST/LAST), the text collation needed and the collation.

I assume you're going to have to have tuplesort.c recognize if all the
comparators are one of a small set of standard comparators that you
can implement on the GPU such as integer and floating point
comparison. In which case you could call a specialized qsort which
implements that comparator inlined instead of calling the standard
function. That might actually be a useful optimization to do anyways
since it may well be much faster even without the GPU. So that would
probably be a good place to start.

But the barrier to get over here might be relatively high. In order to
tolerate that amount of duplicated code and special cases there would
have to be benchmarks showing it's significantly faster and helps
real-world user queries. It would also have to be pretty cleanly
implemented so that it doesn't impose a lot of extra overhead every
time this code needs to be changed -- for example when adding
collations it would have been unfortunate to have to add it to half a
dozen specializations of tuplesort (though frankly I don't think that
would have made that much of a dent in the happiness of the people who
worked on collations).

All that said my personal opinion is that this can be done cleanly and
would be more than worth the benefit even without the GPU -- sorting
integers and floating point numbers is a very common case and Peter
Geoghan recently showed our qsort could be about twice as fast if it
could inline the comparisons. With the GPU I'm curious to see how well
it handles multiple processes contending for resources, it might be a
flashy feature that gets lots of attention but might not really be
very useful in practice. But it would be very interesting to see.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-09-19 14:18:03 Re: /proc/self/oom_adj is deprecated in newer Linux kernels
Previous Message Josh Berkus 2011-09-19 13:50:12 Re: Is there really no interest in SQL Standard?