Re: The case for removing replacement selection sort

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The case for removing replacement selection sort
Date: 2017-09-11 15:17:09
Message-ID: 2f9151f9-7908-ab84-4cbd-4e1a10a75077@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, attached are some initial numbers from the two machines I'm using
for testing (the usual ones - old i5-2500k, new e5-2620), for the two
smaller data sets.

Overall I think the results show quite significant positive impact of
the patch. There are a few cases of regression, but ISTM those may
easily be noise as it's usually 0.03 vs 0.04 second, or something. I'll
switch to the \timing (instead of /usr/bin/time) to get more accurate
results, and rerun those tests.

FWIW, I've been running the tests with trace_sort, so we can inspect the
server log if needed for individual cases. I've pushed the data to

https://bitbucket.org/tvondra/sort-benchmarks-2017/src

On 09/11/2017 03:39 AM, Peter Geoghegan wrote:
> On Sun, Sep 10, 2017 at 5:59 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> [snip]
>
> To be clear, you'll still need to set replacement_sort_tuples high
> when testing RS, to make sure that we really use it for at least the
> first run when we're expected to. (There is no easy way to have
> testing mechanically verify that we really do only have one run in the
> end with RS, but I assume that such paranoia is unneeded.)
>

OK. I'll probably try both, at least with the two small datasets.
Doesn't hurt, and perhaps the numbers will be interesting.

>> I probably won't eliminate the random/DESC data sets, though. At
>> least not from the two smaller data sets - I want to do a bit of
>> benchmarking on Heikki's polyphase merge removal patch, and for
>> that patch those data sets are still relevant. Also, it's useful to
>> have a subset of results where we know we don't expect any change.
>
> Okay. The DESC dataset is going to make my patch look good, because
> it won't change anything for merging (same number of runs in the
> end), but sorting will be slower for the first run with RS.
>

Well, then I think it's a useful test and we should not exclude it. I
assume there will be a few cases where the patch causes regression, and
to judge the overall impact of the patch it's useful to also quantify
the positive cases (even if we expect the improvements).

>> Meh, more data is probably better. And with the reduced work_mem
>> values and skipping of random/DESC data sets it should complete
>> much faster.
>
> Note that my own test case had a much higher number of tuples than
> even 10 million -- it had 100 million tuples. I think that if any of
> your test cases bring about a new insight, it will not be due to the
> number of distinct tuples. But, if the extra time it takes doesn't
> matter to you, then it doesn't matter to me either.
>

I wouldn't say the extra time does not matter, but I think it would be
good to get some initial results quickly, and then perhaps run the
larger tests. So I'll focus on the two smaller data sets for now, and
then perhaps run the larger tests.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
e5-2620-v4.ods application/vnd.oasis.opendocument.spreadsheet 766.3 KB
i5-2500k.ods application/vnd.oasis.opendocument.spreadsheet 592.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-11 15:32:08 Re: The case for removing replacement selection sort
Previous Message Jesper Pedersen 2017-09-11 15:01:14 Re: Fix performance degradation of contended LWLock on NUMA