Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-30 20:29:51
Message-ID: CAM3SWZRY9vQx1P+YKPedKyNwd6cf4_KhDntfir=ReKW60C5L3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 30, 2015 at 9:51 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> As I said, it seems a little bit unfair to hand-tune work_mem or
>> maintenance_work_mem like that. Who can afford to do that? I think you
>> agree that it's untenable to have DBAs allocate work_mem differently
>> for cases where an internal sort or external sort is expected;
>> workloads are just far too complicated and changeable.
>
> Right, I agree with all that. But I think it is important to know
> where the benefits come from. It looks like about half comes from
> being more robust to overly-large memory usage, and half from absolute
> improvements which you get at each implementations own best setting.
> Also, if someone had previously restricted work_mem (or more likely
> maintenance_work_mem) simply to avoid the large memory penalty, they
> need to know to revisit that decision. Although they still don't get
> any actual benefit from using too much memory, just a reduced penalty.

Well, to be clear, they do get a benefit with much larger memory
sizes. It's just that the benefit does not continue indefinitely. I
agree with this assessment, though.

> I'm kind of curious as to why the optimal for the patched code appears
> at 1GB and not lower. If I get a chance to rebuild the test, I will
> look into that more.

I think that the availability of abbreviated keys (or something that
allows most comparisons made by quicksort/the heap to be resolved at
the SortTuple level) could make a big difference for things like this.
Bear in mind that the merge phase has better cache characteristics
when many attributes must be compared, and not mostly just leading
attributes. Alphasort [1] merges in-memory runs (built with quicksort)
to create on-disk runs for this reason. (I tried that, and it didn't
help -- maybe I get that benefit from merging on-disk runs, since
modern machines have so much more memory than in 1994).

> It has a Perc H710 RAID controller with 15,000 RPM drives, but it is
> also a virtualized system that has other stuff going on. The disks
> are definitely better than your average household computer, but I
> don't think they are anything special as far as real database hardware
> goes.

What I meant was that it's better than my laptop. :-)

> What would be next in reviewing the patches? Digging into the C-level
> implementation?

Yes, certainly, but let me post a revised version first. I have
improved the comments, and performed some consolidation of commits.

Also, I am going to get a bunch of test results from the POWER7
system. I think I might see more benefits with higher
maintenance_work_mem settings that you saw, primarily because my case
can mostly just use abbreviated keys during the quicksort operations.
Also, I find it very very useful that while (for example) your 3GB
test case was slower than your 1GB test case, it was only 5% slower. I
have a lot of hope that we can have a cost model for sizing an
effective maintenance_work_mem for this reason -- the consequences of
being wrong are really not that severe. It's unfortunate that we
currently waste so much memory by blindly adhering to
work_mem/maintenance_work_mem. This matters a lot more when we have
parallel sort.

[1] http://www.cs.berkeley.edu/~rxin/db-papers/alphasort.pdf
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-11-30 21:04:43 Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore
Previous Message Alvaro Herrera 2015-11-30 20:13:21 Re: Remaining 9.5 open items