Re: The case for removing replacement selection sort

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The case for removing replacement selection sort
Date: 2017-09-11 00:22:12
Message-ID: CAH2-Wz=XuK6WSUpXX-7=y7n8XmDVV5YVFTUoXfWrgWt+Djk8Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I'm currently re-running the benchmarks we did in 2016 for 9.6, but
> those are all sorts with a single column (see the attached script). But
> it'd be good to add a few queries testing sorts with multiple keys. We
> can either tweak some of the existing data sets + queries, or come up
> with entirely new tests.

I see that work_mem is set like this in the script:

"for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do"

I suggest that we forget about values over 32MB, since the question of
how well quicksort does there was settled by your tests in 2016. I
would also add '4MB' to the list of wm values that you'll actually
test.

Any case with input that is initially in random order or DESC sort
order is not interesting, either. I suggest you remove those, too.

I think we're only interested in benchmarks where replacement
selection really does get its putative best case (no merge needed in
the end). Any (almost) sorted cases (the only cases that you are
interesting to test now) will always manage that, once you set
replacement_sort_tuples high enough, and provided there isn't even a
single tuple that is completely out of order. The "before" cases here
should have a replacement_sort_tuples of 1 billion (so that we're sure
to not have the limit prevent the use of replacement selection in the
first place), versus the "after" cases, which should have a
replacement_sort_tuples of 0 to represent my proposal (to represent
performance in a world where replacement selection is totally
removed).

> For the existing queries, I should have some initial results tomorrow,
> at least for the data sets with 100k and 1M rows. The tests with 10M
> rows will take much more time (it takes 1-2hours for a single work_mem
> value, and we're testing 6 of them).

I myself don't see that much value in a 10M row test.

Thanks for volunteering to test!
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2017-09-11 00:38:24 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Previous Message Thomas Munro 2017-09-11 00:16:14 Re: Automatic testing of patches in commit fest