Fwd: GSOC 2018 Project - A New Sorting Routine

From: Kefan Yang <starordust(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Fwd: GSOC 2018 Project - A New Sorting Routine
Date: 2018-07-13 22:04:55
Message-ID: CAF448YK0xhqK6mg3EeP-dt4j9V-VxkG8Vswmrt3t90OYSrNSsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

---------- Forwarded message ----------
From: Kefan Yang <starordust(at)gmail(dot)com>
Date: 2018-07-13 15:02 GMT-07:00
Subject: Re: GSOC 2018 Project - A New Sorting Routine
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>

Hey Tomas,

Thanks for your reply!

First I’d like to make some clarification about my test result.

> First of all, testing this on t2.micro is *insane* considering that this
> instance type is subject to throttling (depending on CPU credits). I
> don't know if this happened to be an issue during your tests, of course.
> Furthermore, the instance only has 1 virtual core, so there's likely a
> lot of noise due to other tasks (kernel of whatever needs to run).

I agree that testing this on t2.micro instance might not be a very great
idea. But I have tried 100 rounds for each test case to minimize the
influence of other processes. In fact, I think the result is
consistent with yours on i5-2500k (the older CPU): intro sort with only one
preordered check is slightly better on random cases. There's no big
difference if we also take the other cases into consideration. (I would say
intro sort is still better by simply counting the number of improved test
cases)

Secondly, I see the PDF includes results for various data set types
> (random, reversed, mostly random, ...) but the archive you provided only
> includes the random + killer cases.

I tried data of all patterns in the standalone test, but only random and
killer cases in pg_bench. This is a problem but I would try other cases in
pg_bench once I write a shell script to do it automatically.

And finally, I see the PDF reports "CPU clocks" but I'm not sure what
> that actually is? Is that elapsed time in milliseconds or something else?

Sorry for the confusion, but "CPU clocks" actually means CPU clock ticks,
which are units of time of a constant but system-specific length.

After reading your test results, I think the main problems of intro sort are

1. Slow on CREATE INDEX cases.

I am still trying to figure out where the bottleneck is. Is the data
pattern in index creation very different from other cases? Also, pg_qsort
has 10%-20% advantage at creating index even on sorted data (faster CPU, N
= 1000000). This is very strange to me since the two sorting routines
execute exactly the same code when the input data is sorted.

2. Slow on faster CPU and large data set.

The main difference is still on CREATE INDEX. But there are several SELECT
cases(faster CPU, line 179, 206, 377) where pg_qsort can have more than 60%
advantage, which is crazy. All these cases are dealing with mostly sorted
data.

Personally, I would say intro sort is good as long as it has nearly the
same performance as pg_qsort on average cases because of the better
worst-case complexity. In fact, we can make the two sorting routines as
similar as we want by increasing the threshold that intro sort switches to
heap sort. Therefore, I think by no means could pg_qsort be 10%-20% faster
than a well-optimized intro sort because they execute the same code most of
the time. There must be something special about CREATE INDEX test cases,
and I am trying to figure it out. Also, I am considering performing
the preordered check on every recursion, like what pg_qsort does, and see
how it goes.

Finally, you've mentioned

But even if it was, I guess the easiest way to deal with it would be to
> randomize the selection of pivots.

I am wondering if pg_sort with random pivot selecting could be faster than
intro sort. I've just done some simple tests, which shows that intro sort
in faster in all cases. But I guess it depends on how we would implement
the random number generation.

In conclusion, I still believe intro sort is good if we optimized the
implementation and select the parameters more carefully.

Thanks again for your time. I am hoping to see your thoughts on this:)

Regards,

Kefan

2018-07-12 17:50 GMT-07:00 Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>:

> Hi Kefan,
>
> On 07/10/2018 11:02 PM, Kefan Yang wrote:
> > Hello, Hackers!
> >
> > I am working on my project in Google Summer of Code 2018
> > <https://wiki.postgresql.org/wiki/GSoC_2018#Sorting_algorith
> ms_benchmark_and_implementation_.282018.29>.
> > In this project, I am trying to improve the in-memory sorting routine in
> > PostgreSQL. Now I am very excited to share my progress with you guys.
> >
> > Originally, PostgreSQL is using the QuickSort implemented by J. L.
> > Bentley and M. D. McIlroy in "Engineering a sort function" with some
> > modifications. This sorting routine is very fast, yet may fall to O(n^2)
> > time complexity in the worst case scenario. We are trying to find faster
> > sorting algorithms with guaranteed O(nlogn) time complexity.
> >
>
> Time complexity is nice, but it merely estimates the number of
> comparisons needed by the sort algorithm. It entirely ignores other
> factors that are quite important - behavior with caches, for example.
> And quicksort works really well in this regard, I think.
>
> The worst-case complexity may be an issue, but we're already dealing
> with it by using median-of-three (actually, median-of-nine, IIRC) in
> pg_qsort. Hitting the worst-case accidentally is possible, but it should
> be quite unlikely. It's still deterministic so an adversary might
> construct a data set triggering it and use it for a DDoS, but if that
> was a real issue in practice - I assume we'd already hear about it. But
> even if it was, I guess the easiest way to deal with it would be to
> randomize the selection of pivots.
>
> In other words, replacing quicksort with an algorithm that is slower on
> average but has better worst-case behavior is unlikely to be accepted
> with joy, when the worst case is unlikely / bordering with impossible.
>
>
> > In this patch, I
> >
> > 1. Use IntroSort to implement pg_qsort. IntroSort is a hybrid sorting
> > algorithm. It uses Quicksort most of the time, but switch to
> > insertion sort when the array is small and heapsort when the
> > recursion exceeds depth limit.
> > 2. Only check if the array is preordered once on the whole array to get
> > better overall performance. Previously the sorting routine checks if
> > the array is preordered on every recursion.
> >
> > After some performance test, I find the new sorting routine
> >
> > 1. Slightly faster on sorting random arrays.
> > 2. Much faster on worst case scenario since it has O(nlogn) worst case
> > complexity.
> > 3. Has nearly the same performance on mostly sorted arrays.
> >
> > I use both standalone tests and pgbench to show the result. A more
> > detailed report is in the attachment, along with the patch and some
> > scripts to reproduce the result.
> >
>
> I find those results rather unconvincing.
>
> First of all, testing this on t2.micro is *insane* considering that this
> instance type is subject to throttling (depending on CPU credits). I
> don't know if this happened to be an issue during your tests, of course.
> Furthermore, the instance only has 1 virtual core, so there's likely a
> lot of noise due to other tasks (kernel of whatever needs to run).
>
> Secondly, I see the PDF includes results for various data set types
> (random, reversed, mostly random, ...) but the archive you provided only
> includes the random + killer cases.
>
> And finally, I see the PDF reports "CPU clocks" but I'm not sure what
> that actually is? Is that elapsed time in milliseconds or something else?
>
> So I've done a bit of benchmarking by running the battery of tests I've
> previously used for sort-related patches, and those results seem much
> less optimistic. I've done this on two different x86 machines (one with
> an old i5-2500K CPU, the other one with rather new e5-2620v4). Full
> results and scripts are available at [1] and [2], a summary of the
> results is attached here.
>
> Each spreadsheet has a couple of "comparison N" sheets, where N is the
> number of rows on the test. The last set of columns is comparison to
> unpatched master, where values below 100% mean "faster than master" and
> above 100% "slower than master".
>
> On the (quite old) i5-2500k CPU, there's pretty much no difference
> between master with and without the patch.
>
> On the (much newer) e5-2620v4 system, the results seem somewhat more
> variable - ~10% regressions on CREATE INDEX cases, ~5% gains on the
> other cases, for the smallest data set (10k rows). But as the data set
> grows, the regressions pretty clearly prevail. Not great, I guess :-(
>
> I don't want to discourage you from working on sorting, and I'm sure
> significant improvements in this area are possible (and needed). But my
> guess is that those optimizations will happen at higher level, not by
> tweaking the low-level algorithm.
>
>
> regards
>
> [1] https://bitbucket.org/tvondra/sort-intro-sort-i5/src/master/
> [2] https://bitbucket.org/tvondra/sort-intro-sort/src/master/
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-07-13 22:10:18 Re: GSOC 2018 Project - A New Sorting Routine
Previous Message Robert Haas 2018-07-13 21:57:59 Re: pgsql: Add wait event for fsync of WAL segments