Re: Fwd: GSOC 2018 Project - A New Sorting Routine

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kefan Yang <starordust(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fwd: GSOC 2018 Project - A New Sorting Routine
Date: 2018-07-14 01:03:21
Message-ID: 2af33cf9-16cf-df04-e8c4-6ab33a1b6f68@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/14/2018 12:04 AM, Kefan Yang wrote:
>
> ...
>
> And finally, I see the PDF reports "CPU clocks" but I'm not sure what
> that actually is? Is that elapsed time in milliseconds or something
> else?
>
>
> Sorry for the confusion, but "CPU clocks" actually means CPU clock
> ticks, which are units of time of a constant but system-specific length.
>

OK, how do you measure this metric?

> After reading your test results, I think the main problems of intro sort are
>
> 1. Slow on CREATE INDEX cases.
>
> I am still trying to figure out where the bottleneck is. Is the data
> pattern in index creation very different from other cases? Also,
> pg_qsort has 10%-20% advantage at creating index even on sorted data
> (faster CPU, N = 1000000). This is very strange to me since the two
> sorting routines execute exactly the same code when the input data is
> sorted.
>
> 2. Slow on faster CPU and large data set.
>
> The main difference is still on CREATE INDEX.  But there are several
> SELECT cases(faster CPU, line 179, 206, 377) where pg_qsort can have
> more than 60% advantage, which is crazy. All these cases are dealing
> with mostly sorted data. 
>
> Personally, I would say intro sort is good as long as it has nearly the
> same performance as pg_qsort on average cases because of the better
> worst-case complexity. In fact, we can make the two sorting routines as
> similar as we want by increasing the threshold that intro sort switches
> to heap sort. Therefore, I think by no means could pg_qsort be 10%-20%
> faster than a well-optimized intro sort because they execute the same
> code most of the time. There must be something special about CREATE
> INDEX test cases, and I am trying to figure it out. Also, I am
> considering performing the preordered check on every recursion, like
> what pg_qsort does, and see how it goes.
>

I don't know. All I have is the results I shared. I suggest you try
reproducing the tests on your system - the scripts are in the git
repositories.

> Finally, you've mentioned
>
> But even if it was, I guess the easiest way to deal with it would be to
> randomize the selection of pivots.
>
>
> I am wondering if pg_sort with random pivot selecting could be faster
> than intro sort. I've just done some simple tests, which shows that
> intro sort in faster in all cases. But I guess it depends on how we
> would implement the random number generation.
>

Unlikely. The pivot randomization is merely a way to defeat an adversary
attempting to perform DoS by triggering sorts on a killer sequence.
Randomization makes it much harder/impossible, because the killer
sequence changes over time. It's not a regular performance optimization.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-07-14 01:24:36 Re: Fwd: GSOC 2018 Project - A New Sorting Routine
Previous Message Pierre Ducroquet 2018-07-14 00:00:22 Re: [PATCH] LLVM tuple deforming improvements