Re: The case for removing replacement selection sort

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The case for removing replacement selection sort
Date: 2017-09-08 17:06:10
Message-ID: CAH2-WzkoKKtqOhDhR0MuG9yhtN7vkxi5PY4gDa0DY2DTJf486w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 7, 2017 at 2:49 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Sep 7, 2017 at 5:38 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Do we need/want to repeat some of that benchmarking on these patches? I
>> don't recall how much this code changed since those benchmarks were done
>> in the 9.6 cycle.
>
> +1 for some new benchmarking. I'm all for removing this code if we
> don't need it any more, but it'd be a lot better to have more numbers
> before deciding that.

It isn't always a win. For example, if I alter the pgbench_accounts
table so that the column "aid" is of type numeric, the picture changes
for my test case -- replacement selection is still somewhat faster for
the "select count(distinct aid) from pgbench_accounts" query with
work_mem=2MB. It takes about 5.4 seconds with replacement selection,
versus 7.4 seconds for quicksorting. But, I still think we should
proceed with my patch, because:

* It's still faster with int4/int8 comparisons on modern hardware, and
I think that most ordered inputs will be of those types in practice.
The trade-off between those two properties (faster for int4/int8 when
ordered, slower for numeric) recommends killing replacement selection
entirely. It's not a slam dunk, but it makes sense on performance
grounds, IMV.

* The comparator numeric_fast_cmp() is not well optimized -- it
doesn't avoid allocating memory with each call, for example, unlike
varstrfastcmp_locale(). This could and should be improved, and might
even change the picture for this second test case.

* With the default work_mem of 8MB, we never use replacement selection
anyway. Whatever its merits, it's pretty rare to use replacement
selection simply because the default replacement_sort_tuples just
isn't that many tuples (150,000).

* The upside of my patch is not inconsiderable: We can remove a lot of
code, which will enable further improvements in the future, which are
far more compelling (cleaner memory management, the use of memory
batches during run generation).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-09-08 17:14:50 Re: PoC plpgsql - possibility to force custom or generic plan
Previous Message Bossart, Nathan 2017-09-08 17:05:14 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands