Re: The case for removing replacement selection sort

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The case for removing replacement selection sort
Date: 2017-09-11 01:39:20
Message-ID: CAH2-Wzn0txhFkaw2tFq_qiMMtLd6QbxWSo80vRzbQr8eYkPctw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 10, 2017 at 5:59 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> OK, so 1MB, 4MB, 8MB, 32MB?

Right.

> Ah, so you suggest doing all the tests on current master, by only
> tweaking the replacement_sort_tuples value? I've been testing master vs.
> your patch, but I guess setting replacement_sort_tuples=0 should have
> the same effect.

I think that there may be a very slight advantage to RS-free
performance with my patch actually applied, because it removes the
number of instructions that heap maintenance (merging) routines
consist of. This is due to my removing HEAPCOMPARE()/tupindex
handling. However, it's probably a low single digit percentage
difference -- not enough to be worth taking into account, probably. I
assume that just setting replacement_sort_tuples to zero is more
convenient for you (that's what I did).

To be clear, you'll still need to set replacement_sort_tuples high
when testing RS, to make sure that we really use it for at least the
first run when we're expected to. (There is no easy way to have
testing mechanically verify that we really do only have one run in the
end with RS, but I assume that such paranoia is unneeded.)

> I probably won't eliminate the random/DESC data sets, though. At least
> not from the two smaller data sets - I want to do a bit of benchmarking
> on Heikki's polyphase merge removal patch, and for that patch those data
> sets are still relevant. Also, it's useful to have a subset of results
> where we know we don't expect any change.

Okay. The DESC dataset is going to make my patch look good, because it
won't change anything for merging (same number of runs in the end),
but sorting will be slower for the first run with RS.

> Meh, more data is probably better. And with the reduced work_mem values
> and skipping of random/DESC data sets it should complete much faster.

Note that my own test case had a much higher number of tuples than
even 10 million -- it had 100 million tuples. I think that if any of
your test cases bring about a new insight, it will not be due to the
number of distinct tuples. But, if the extra time it takes doesn't
matter to you, then it doesn't matter to me either.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-09-11 01:48:00 Re: Setting pd_lower in GIN metapage
Previous Message Michael Paquier 2017-09-11 01:24:19 Re: Setting pd_lower in GIN metapage