Re: Using quicksort for every external sort run

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using quicksort for every external sort run
Date: 2016-03-22 22:35:39
Message-ID: 389a0605-755b-ad07-cb38-8fd1875d877d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/22/2016 11:07 PM, Peter Geoghegan wrote:
> On Tue, Mar 22, 2016 at 2:27 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Each query was executed 5x for each work_mem value (between 8MB and 1GB),
>> and then a median of the runs was computed and that's what's on the
>> "comparison". This compares a414d96ad2b without (master) and with the
>> patches applied (patched). The last set of columns is simply a "speedup"
>> where "<1.0" means the patched code is faster, while >1.0 means it's slower.
>> Values below 0.9 or 1.1 are using green or red background, to make the most
>> significant improvements or regressions clearly visible.
>>
>> For the smaller data set (1M rows), things works pretty fine. There are
>> pretty much no red cells (so no significant regressions), but quite a few
>> green ones (with duration reduced by up to 50%). There are some results in
>> the 1.0-1.05 range, but considering how short the queries are, I don't think
>> this is a problem. Overall the total duration was reduced by ~20%, which is
>> nice.
>>
>> For the 10M data sets, total speedup is also almost ~20%, and the speedups
>> for most queries are also very nice (often ~50%).
>
> To be clear, you seem to mean that ~50% of the runtime of the query
> was removed. In other words, the quicksort version is twice as fast.

Yes, that's what I meannt. Sorry for the inaccuracy.

>
>> But the number of regressions is considerably higher - there's a
>> small number of queries that got significantly slower for multiple
>> data sets, particularly for smaller work_mem values.
>
> No time to fully consider these benchmarks right now, but: Did you
> make sure to set replacement_sort_mem very low so that it was never
> used when patched? And, was this on the latest version of the patch,
> where memory contexts were reset (i.e. the version that got
> committed recently)? You said something about memory batching, so
> ISTM that you should set that to '64', to make sure you don't get one
> longer run. That might mess with merging.

I've tested the patch you've sent on 2016/3/11, which I believe is the
last version. I haven't tuned the replacement_sort_mem at all. because
my understanding was that it's not a 9.6 material (per your message). So
my intent was to test the configuration people are likely to use by default.

I'm not sure about the batching - that was merely a guess of what might
be the problem.

>
> Note that the master branch has the memory batching patch as of a
> few days back, so it that's the problem at the low end, then that's
> bad.

I'm not sure which commit are you referring to. The benchmark was done
on a414d96a (from 2016/3/10). However I'd expect that to affect both
sets of measurements, although it's possible that it affects the patched
version differently.

> But I don't think it is: I think that the regressions at the low end
> are about abbreviated keys, particularly the numeric cases. There is
> a huge gulf in the cost of those comparisons (abbreviated vs
> authoritative), and it is legitimately a weakness of the patch that
> it reduces the number in play. I think it's still well worth it, but
> it is a downside. There is no reason why the authoritative numeric
> comparator has to allocate memory, but right now that case isn't
> optimized.

Yes, numeric and text are the most severely affected cases.

>
> I find it weird that the patch is exactly the same as master in a
> lot of cases. ISTM that with a case where you use 1GB of memory to
> sort 1 million rows, you're so close to an internal sort that it
> hardly matters (master will not need a merge step at all, most
> likely). The patch works best with sorts that take tens of seconds,
> and I don't think I see any here, nor any high memory tests where RS
> flops. Now, I think you focused on regressions because that was what
> was interesting, which is good. I just want to put that in context.

I don't think the tests on 1M rows are particularly interesting, and I
don't see any noticeable regressions there. Perhaps you mean the tests
on 10M rows instead?

Yes, you're correct - I was mostly looking for regressions. However, the
worst cases of regressions are on relatively long sorts, e.g. slowing
down from 35 seconds to 64 seconds, etc. So that's quite long, and it's
surely using non-trivial amount of memory. Or am I missing something?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-22 23:19:44 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Peter Krauss 2016-03-22 22:28:39 Re: problem with precendence order in JSONB merge operator