Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-25 00:33:19
Message-ID: CAM3SWZSsgVvriLsaPYUec-iSGfHzMXsWrnQ4ZhUTtBaGjRbWBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2015 at 3:32 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> My feeling is that numbers rarely speak for themselves, without LSD. (Which
> numbers?)

Guffaw.

> How are we doing here? Keen to see this work get committed, so we can move
> onto parallel sort. What's the summary?

I showed a test case where a CREATE INDEX sort involving 5 runs and a
merge only took about 18% longer than an equivalent fully internal
sort [1] using over 5 times the memory. That's about 2.5X faster than
the 9.5 performance on the same system with the same amount of memory.

Overall, the best cases I saw were the original "quicksort with
spillover" cases [2]. They were just under 4X faster. I care about
that less, though, because that will happen way less often, and won't
help with larger sorts that are even more CPU bound.

There is a theoretical possibility that this is slower on systems
where multiple merge passes are required as a consequence of not
having runs as long as possible (due to not using replacement
selection heap). That will happen very infrequently [3], and is very
probably still worth it.

So, the bottom line is: This patch seems very good, is unlikely to
have any notable downside (no case has been shown to be regressed),
but has yet to receive code review. I am working on a new version with
the first two commits consolidated, and better comments, but that will
have the same code, unless I find bugs or am dissatisfied. It mostly
needs thorough code review, and to a lesser extent some more
performance testing.

Parallel sort is very important. Robert, Amit and I had a call about
this earlier today. We're all in agreement that this should be
extended in that direction, and have a rough idea about how it ought
to fit together with the parallelism primitives. Parallel sort in 9.6
could certainly happen -- that's what I'm aiming for. I haven't really
done preliminary research yet; I'll know more in a little while.

> How about we commit it with a sort_algorithm = 'foo' parameter so we can
> compare things before release of 9.6?

I had a debug GUC (like the existing one to disable top-N heapsorts)
that disabled "quicksort with spillover". That's almost the opposite
of what you're asking for, though, because that makes us never use a
heap. You're asking for me to write a GUC to always use a heap.

That's not a good way of testing this patch, because it's inconvenient
to consider the need to use a heap beyond the first run (something
that now exists solely for the benefit of "quicksort with spillover";
a heap will often never be used even for the first run). Besides, the
merge optimization is a big though independent part of this, and
doesn't make sense to control with the same GUC.

If I haven't gotten this right, we should not commit the patch. If the
patch isn't superior to the existing approach in virtually every way,
then there is no point in making it possible for end-users to disable
with messy GUCs -- it should be reverted.

[1] Message: http://www.postgresql.org/message-id/CAM3SWZRiHaF7jdf923ZZ2qhDJiErqP5uU_+JPuMvUmeD0z9fFA@mail.gmail.com
Attachment:
http://www.postgresql.org/message-id/attachment/39660/quicksort_external_test.txt

[2] http://www.postgresql.org/message-id/CAM3SWZTzLT5Y=VY320NznAyz2z_em3us6x=7rXMEUma9Z9yN6Q@mail.gmail.com

[3] http://www.postgresql.org/message-id/CAM3SWZTX5=nHxPpogPirQsH4cR+BpQS6r7Ktax0HMQiNLf-1qA@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-11-25 00:44:24 Re: parallelism and sorting
Previous Message Jim Nasby 2015-11-25 00:32:35 Re: [PROPOSAL] VACUUM Progress Checker.