From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Kefan Yang <starordust(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Fwd: GSOC 2018 Project - A New Sorting Routine |
Date: | 2018-07-14 01:03:21 |
Message-ID: | 2af33cf9-16cf-df04-e8c4-6ab33a1b6f68@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/14/2018 12:04 AM, Kefan Yang wrote:
>
> ...
>
> And finally, I see the PDF reports "CPU clocks" but I'm not sure what
> that actually is? Is that elapsed time in milliseconds or something
> else?
>
>
> Sorry for the confusion, but "CPU clocks" actually means CPU clock
> ticks, which are units of time of a constant but system-specific length.
>
OK, how do you measure this metric?
> After reading your test results, I think the main problems of intro sort are
>
> 1. Slow on CREATE INDEX cases.
>
> I am still trying to figure out where the bottleneck is. Is the data
> pattern in index creation very different from other cases? Also,
> pg_qsort has 10%-20% advantage at creating index even on sorted data
> (faster CPU, N = 1000000). This is very strange to me since the two
> sorting routines execute exactly the same code when the input data is
> sorted.
>
> 2. Slow on faster CPU and large data set.
>
> The main difference is still on CREATE INDEX. But there are several
> SELECT cases(faster CPU, line 179, 206, 377) where pg_qsort can have
> more than 60% advantage, which is crazy. All these cases are dealing
> with mostly sorted data.
>
> Personally, I would say intro sort is good as long as it has nearly the
> same performance as pg_qsort on average cases because of the better
> worst-case complexity. In fact, we can make the two sorting routines as
> similar as we want by increasing the threshold that intro sort switches
> to heap sort. Therefore, I think by no means could pg_qsort be 10%-20%
> faster than a well-optimized intro sort because they execute the same
> code most of the time. There must be something special about CREATE
> INDEX test cases, and I am trying to figure it out. Also, I am
> considering performing the preordered check on every recursion, like
> what pg_qsort does, and see how it goes.
>
I don't know. All I have is the results I shared. I suggest you try
reproducing the tests on your system - the scripts are in the git
repositories.
> Finally, you've mentioned
>
> But even if it was, I guess the easiest way to deal with it would be to
> randomize the selection of pivots.
>
>
> I am wondering if pg_sort with random pivot selecting could be faster
> than intro sort. I've just done some simple tests, which shows that
> intro sort in faster in all cases. But I guess it depends on how we
> would implement the random number generation.
>
Unlikely. The pivot randomization is merely a way to defeat an adversary
attempting to perform DoS by triggering sorts on a killer sequence.
Randomization makes it much harder/impossible, because the killer
sequence changes over time. It's not a regular performance optimization.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2018-07-14 01:24:36 | Re: Fwd: GSOC 2018 Project - A New Sorting Routine |
Previous Message | Pierre Ducroquet | 2018-07-14 00:00:22 | Re: [PATCH] LLVM tuple deforming improvements |