Quick Links

Re: Minor performance improvement in transition to external sort

From:	Jeremy Harris <jgh(at)wizmail(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Minor performance improvement in transition to external sort
Date:	2014-02-09 17:11:30
Message-ID:	52F7B6C2.2000705@wizmail.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 06/02/14 18:21, Jeff Janes wrote:
> Did you try sorting already-sorted, reverse
> sorted, or pipe-organ shaped data sets? We will also need to test it on
> strings. I usually use md5(random()::text) to generate strings for such
> purposes, at least for a first pass.

Attached is version 2 of the patch, which fixes the performance on
constant-input.

Summary (low numbers better, vs 858ec11):

Random ints: 83% compares, level on time.
Sorted ints: level compares, 70% time.
Reverse-sorted ints: 10% compares, 15% time (!)
Constant ints: level compares, 75% time
Pipe-organ ints: 80% compares, 107% time
Random text: 83% compares, 106% time

> Another optimization that is possible is to do only one comparison at each
> level (between the two children) all the way down the heap, and then
> compare the parent to the chain of smallest children in reverse order.
> This can substantially reduce the number of comparisons on average,
> because most tuples sift most of the way down the heap (because most of the
> tuples are at the bottom of the heap). Robert submitted a patch to do this
> in the main siftdown routine (which for some reason is
> named tuplesort_heap_siftup, rather than siftdown), and I found it a
> substantial improvement but it never got pushed forward. I think Robert
> was concerned it might make some obscure cases worse rather than better,
> and no one put it through enough serious testing to overcome that concern.

I've done an implementation of this (also attached: siftdown_bounce).
Summary:

Random ints: 72% compares, 104% time.
Sorted ints: 200% compares, 270% time. (ouch)
Reverse-sorted ints: 7% compares, 12% time
Constant ints: 150% compares, 275% time
Pipe-organ ints: 93% compares, 115% time
Random text: 72% compares, 91% time

- I don't like the performance on already-sorted input. Thinking
into this, a sorted array is already a well-formed heap so optimal
behaviour is a single compare per item (lacking the global knowledge).
This "bounce" method will always do a full walk down the tree and
back, hence the poor behaviour. Unless the planner can tell
the sort when sorted input is possible I don't think we dare use
this despite the benefit on random input.

The tests were run with an instrumented variant of each patch, counting
compares and time for just the heapify portion of the inittapes()
routine. Test script (jgh.sql and results spreadsheet) also attached.
--
Cheers,
Jeremy

Attachment	Content-Type	Size
heapify_siftdown_v2.patch	text/x-patch	4.1 KB
siftdown_bounce.patch	text/x-patch	4.4 KB
jgh.sql	application/sql	21.5 KB
siftdown_performance.ods	application/vnd.oasis.opendocument.spreadsheet	32.1 KB

In response to

Re: Minor performance improvement in transition to external sort at 2014-02-06 18:21:16 from Jeff Janes

Responses

Re: Minor performance improvement in transition to external sort at 2014-02-21 00:27:28 from Jeremy Harris

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2014-02-09 17:31:46	Re: specifying repeatable read in PGOPTIONS
Previous Message	Andres Freund	2014-02-09 17:10:22	Re: specifying repeatable read in PGOPTIONS