Re: Is tuplesort_heap_siftup() a misnomer?

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is tuplesort_heap_siftup() a misnomer?
Date: 2016-09-08 07:01:40
Message-ID: 0b4f0f78-28d3-e99f-70bb-962cf8a8025c@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/08/2016 03:36 AM, Peter Geoghegan wrote:
> On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The reason it's called siftup is that that's what Knuth calls it.
>> See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
>> Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.
>
> I see that Knuth explains that these steps form the siftup procedure.
> What steps does tuplesort_heap_insert correspond to, if any?

Hmm. The loop in tuplesort_heap_siftup() indeed looks the same as
Knuth's steps H3-H8. But the start is different. Knuth's algorithm
begins with the situation that the top node of the sub-heap is in wrong
place, and it pushes it downwards, until the whole sub-heap satisfies
the heap condition again. But tuplesort_heap_siftup() begins by moving
the last value in the heap to the top, and then it does the H3-H8 steps
to move it to the right place.

Using Wikipedia's terms, tuplesort_heap_siftup() function as whole is
the Extract operation. And the loop inside it corresponds to the
Max-Heapify operation, which is the same as Knuth's "siftup".

I still think tuplesort_heap_siftup is a confusing name, although I'm
not sure that Peter's "compact" is much better. I suggest that we rename
it to tuplesort_heap_delete_top(). In comments within the function,
explain that the *loop* corresponds to the "siftup" in Knuth's book.

Interestingly, as Knuth explains the siftup algorithm, it performs a
"replace the top" operation, rather than "remove the top". But we use it
to remove the top, by moving the last element to the top and running the
algorithm after that. Peter's other patch, to change the way we
currently replace the top node, to do it in one pass rather than as
delete+insert, is actually Knuth's siftup algorithm.

Peter, looking at your "displace" patch in this light, I think
tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm
calling it now), should share a common subroutine. Displace replaces the
top node with the new node passed as argument, and then calls the
subroutine to push it down to the right place. Delete_top replaces the
top node with the last value in the heap, and then calls the subroutine.
Or perhaps delete_top should remove the last value from the heap, and
then call displace() with it, to re-insert it.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2016-09-08 07:09:51 Re: Supporting SJIS as a database encoding
Previous Message Noah Misch 2016-09-08 07:00:58 Re: Suggestions for first contribution?