Re: GiST kNN search queue (Re: KNN-GiST with recheck)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Date: 2014-12-22 10:07:35
Message-ID: 5497ED67.2090109@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/20/2014 10:50 PM, Peter Geoghegan wrote:
> On Wed, Dec 17, 2014 at 6:07 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> How about adding a src/backend/lib/README for that, per attached?
>
> I took a quick look at this. Some observations:
>
> * I like the idea of adding a .../lib README. However, the README
> fails to note that ilist.c implements an *integrated* list, unlike the
> much more prevalent cell-based List structure. It should note that,
> since that's the whole point of ilist.c.

Added a sentence on that.

> * pairingheap_remove() is technically dead code. It still makes sense
> that you'd have it in this patch, but I think there's an argument for
> not including it at all on the theory that if you need to use it you
> should use a different data structure. After all, the actual
> (non-amortized) complexity of that operation is O(n) [1], and if
> remove operations are infrequent as we might expect, that might be the
> more important consideration. As long as you are including
> pairingheap_remove(), though, why is the local variable "prev_ptr" a
> pointer to a pointer to a pairingheap_node, rather than just a pointer
> to a pairingheap_node?
>
> * Similarly, the function-like macro pairingheap_reset() doesn't seem
> to pull its weight. Why does it exist alongside pairingheap_free()?
> I'm not seeing a need to re-use a heap like that.

pairingheap_remove and pairingheap_reset are both unused in this patch,
but they were needed for the other use case, tracking snapshots to
advance xmin more aggressively, discussed here:
http://www.postgresql.org/message-id/5488ACF0.8050901@vmware.com. In
fact, without the pairingheap_remove() operation, the prev_or_parent
pointer wouldn't be necessarily at all. We could've added them as a
separate patch, but that seems like unnecessary code churn.

The prev_ptr variable is used to set the parent's first_child pointer,
or the previous sibling's next_sibling pointer, depending on whether the
removed node is the parent's first child or not. I'll add more comments
in pairingheap_remove to explain that.

> * "Assert(!pairingheap_is_empty(heap))" appears in a few places.
> You're basically asserting that a pointer isn't null, often
> immediately before dereferencing the pointer. This seems to be of
> questionable value.

I copied that from binaryheap.c. It has some documentation value; they
make it easy to see that the functions require the heap to not be empty.
It's also explained in comments, but still.

> * I think that the indentation of code could use some tweaking.
>
> * More comments, please. In particular, comment the struct fields in
> pairingheap_node. There are various blocks of code that could use at
> least an additional terse comment, too.

Added some comments, hope it's better now.

> * You talked about tuplesort.c integration. In order for that to
> happen, I think the comparator logic should know less about min-heaps.
> This should formally be a max-heap, with the ability to provide
> customizations only encapsulated in the comparator (like inverting the
> comparison logic to get a min-heap, or like custom NULLs first/last
> behavior). So IMV this comment should be more generic/anticipatory:
>
> + /*
> + * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
> + * and >0 iff a > b. For a min-heap, the conditions are reversed.
> + */
> + typedef int (*pairingheap_comparator) (const pairingheap_node *a,
> const pairingheap_node *b, void *arg);
>
> I think the functions should be called pairing_max_heap* for this
> reason, too. Although that isn't consistent with binaryheap.c, so I
> guess this whole argument is a non-starter.

I don't see what the problem is. The pairingheap.c (and binaryheap.c)
code works the same for min and max-heaps. The comments assume a
max-heap in a few places, but that seems OK to me in the context.

> * We should just move rbtree.c to .../lib. We're not using CVS anymore
> -- the history will be magically preserved.

Yeah, I tend to agree. Tom Lane has not liked moving things, because it
breaks back-patching. That's true in general, even though git has some
smarts to follow renames. I think it would work in this case, though.
Anyway, let's discuss and do that as a separate patch, so that we don't
get stuck on that.

> Anyway, to get to the heart of the matter: in general, I think the
> argument for the patch is sound. It's not a stellar improvement, but
> it's worthwhile. That's all I have for now...

Ok, thanks for the review! I have committed this, with some cleanup and
more comments added.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-22 10:16:11 Moving src/backend/utils/misc/rbtree.c to src/backend/lib
Previous Message Petr Jelinek 2014-12-22 09:07:08 Re: TABLESAMPLE patch