Re: The case for removing replacement selection sort

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The case for removing replacement selection sort
Date: 2017-09-11 00:07:36
Message-ID: 131a54e5-c8ea-89e7-6f95-445d07a24972@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 09/11/2017 01:03 AM, Peter Geoghegan wrote:
> On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> This may be a bit "how long is a piece of string" but how do those two
>> compare with string sorting in an interesting encoding/locale -- say
>> /usr/share/dict/polish in pl_PL for example. It's certainly true that
>> people do sort text as well as numbers.
>
> I haven't looked at text, because I presume that it's very rare for
> tuples within a table to be more or less ordered by a text
> attribute. While the traditional big advantage of replacement
> selection has always been its ability to double initial run length on
> average, where text performance is quite interesting because
> localized clustering still happens, that doesn't really seem relevant
> here. The remaining use of replacement selection is expressly only
> about the "single run, no merge" best case, and in particular about
> avoiding regressions when upgrading from versions prior to 9.6 (9.6
> is the version where we began to generally prefer quicksort).
>
>> Also, people often sort on keys of more than one column....
>
> Very true. I should test this.
>

I'm currently re-running the benchmarks we did in 2016 for 9.6, but
those are all sorts with a single column (see the attached script). But
it'd be good to add a few queries testing sorts with multiple keys. We
can either tweak some of the existing data sets + queries, or come up
with entirely new tests.

The current tests construct data sets with different features (unique or
low/high-cardinality, random/sorted/almost-sorted). How should that work
for multi-key sorts? Same features for all columns, or some mix?

For the existing queries, I should have some initial results tomorrow,
at least for the data sets with 100k and 1M rows. The tests with 10M
rows will take much more time (it takes 1-2hours for a single work_mem
value, and we're testing 6 of them).

But perhaps we can reduce the number of tests somehow, only testing the
largest/smallest work_mem values? So that we could get some numbers now,
possibly running the complete test suite later.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
sort-bench.sh application/x-shellscript 15.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-09-11 00:10:17 Re: Automatic testing of patches in commit fest
Previous Message Michael Paquier 2017-09-10 23:40:14 Automatic testing of patches in commit fest