Re: Using quicksort for every external sort run

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-12 19:42:05
Message-ID: CA+CSw_t0nss16bwp-DzRyUy9e847hQCZVV+j0o+MNcrpLJnVDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 12, 2015 at 12:41 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Wed, Dec 9, 2015 at 2:44 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>>
>> I guess you mean insertion sort. What's the theoretical justification
>> for the change?
>
> Well my thinking was that hard coding a series of comparisons would be
> faster than a loop doing a O(n^2) algorithm even for small constants.
> And sort networks are perfect for hard coded sorts because they do the
> same comparisons regardless of the results of previous comparisons so
> there are no branches. And even better the comparisons are as much as
> possible independent of each other -- sort networks are typically
> measured by the depth which assumes any comparisons between disjoint
> pairs can be done in parallel. Even if it's implemented in serial the
> processor is probably parallelizing some of the work.
>
> So I implemented a quick benchmark outside Postgres based on sorting
> actual SortTuples with datum1 defined to be random 64-bit integers (no
> nulls). Indeed the sort networks perform faster on average despite
> doing more comparisons. That makes me think the cpu is indeed doing
> some of the work in parallel.

The open coded version you shared bloats the code by 37kB, I'm not
sure it is pulling it's weight, especially given relatively heavy
comparators. A quick index creation test on int4's profiled with perf
shows about 3% of CPU being spent in the code being replaced. Any
improvement on that is going to be too small to easily quantify.

As the open coding doesn't help with eliminating control flow
dependencies, so my idea is to encode the sort network comparison
order in an array and use that to drive a simple loop. The code size
would be pretty similar to insertion sort and the loop overhead should
mostly be hidden by the CPU OoO machinery. Probably won't help much,
but would be interesting and simple enough to try out. Can you share
you code for the benchmark so I can try it out?

Regards,
Ants Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2015-12-12 19:47:16 Re: Using quicksort for every external sort run
Previous Message Jaime Casanova 2015-12-12 19:31:59 Re: Disabling an index temporarily