Re: Is tuplesort_heap_siftup() a misnomer?

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is tuplesort_heap_siftup() a misnomer?
Date: 2016-09-08 17:19:10
Message-ID: CAM3SWZTEENdg1TvjOLRpDk0sA2tPDc9tNrk6_Oq-GjCvd2f9Yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> I still think tuplesort_heap_siftup is a confusing name, although I'm not
> sure that Peter's "compact" is much better. I suggest that we rename it to
> tuplesort_heap_delete_top(). In comments within the function, explain that
> the *loop* corresponds to the "siftup" in Knuth's book.

I'm also fine with that name.

> Interestingly, as Knuth explains the siftup algorithm, it performs a
> "replace the top" operation, rather than "remove the top". But we use it to
> remove the top, by moving the last element to the top and running the
> algorithm after that. Peter's other patch, to change the way we currently
> replace the top node, to do it in one pass rather than as delete+insert, is
> actually Knuth's siftup algorithm.

Knuth must have a strange sense of humor.

> Peter, looking at your "displace" patch in this light, I think
> tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm
> calling it now), should share a common subroutine. Displace replaces the top
> node with the new node passed as argument, and then calls the subroutine to
> push it down to the right place. Delete_top replaces the top node with the
> last value in the heap, and then calls the subroutine. Or perhaps delete_top
> should remove the last value from the heap, and then call displace() with
> it, to re-insert it.

I can see the value in that, but I'd still rather not. The code reuse
win isn't all that large, and having a separate
tuplesort_heap_root_displace() routine allows us to explain what's
going on for merging, to assert what we're able to assert with tape
numbers vs. heap position, and to lose the HEAPCOMPARE()/checkIndex
cruft that the existing routines need (if only barely) to support
replacement selection. I would be surprised if with a very tight inner
loop like this, HEAPCOMPARE() has an appreciable overhead (maybe it
increases pipeline stalls).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-08 17:23:17 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Claudio Freire 2016-09-08 17:18:49 Re: Parallel tuplesort (for parallel B-Tree index creation)