Re: Improve eviction algorithm in ReorderBuffer

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve eviction algorithm in ReorderBuffer
Date: 2024-03-13 02:22:59
Message-ID: CAHut+Ps6nk+eoa0AzH3YQjAW46qkiNU_RC+qJhd_N30P8NWsXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 13, 2024 at 12:48 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Mar 13, 2024 at 10:15 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 12, 2024 at 4:23 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Mar 8, 2024 at 12:58 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > ...
> > > > > > 5.
> > > > > > + *
> > > > > > + * If 'indexed' is true, we create a hash table to track of each node's
> > > > > > + * index in the heap, enabling to perform some operations such as removing
> > > > > > + * the node from the heap.
> > > > > > */
> > > > > > binaryheap *
> > > > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
> > > > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare,
> > > > > > + bool indexed, void *arg)
> > > > > >
> > > > > > BEFORE
> > > > > > ... enabling to perform some operations such as removing the node from the heap.
> > > > > >
> > > > > > SUGGESTION
> > > > > > ... to help make operations such as removing nodes more efficient.
> > > > > >
> > > > >
> > > > > But these operations literally require the indexed binary heap as we
> > > > > have an assertion:
> > > > >
> > > > > void
> > > > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d)
> > > > > {
> > > > > bh_nodeidx_entry *ent;
> > > > >
> > > > > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> > > > > Assert(heap->bh_indexed);
> > > > >
> > > >
> > > > I didn’t quite understand -- the operations mentioned are "operations
> > > > such as removing the node", but binaryheap_remove_node() also removes
> > > > a node from the heap. So I still felt the comment wording of the patch
> > > > is not quite correct.
> > >
> > > Now I understand your point. That's a valid point.
> > >
> > > >
> > > > Now, if the removal of a node from an indexed heap can *only* be done
> > > > using binaryheap_remove_node_ptr() then:
> > > > - the other removal functions (binaryheap_remove_*) probably need some
> > > > comments to make sure nobody is tempted to call them directly for an
> > > > indexed heap.
> > > > - maybe some refactoring and assertions are needed to ensure those
> > > > *cannot* be called directly for an indexed heap.
> > > >
> > >
> > > If the 'index' is true, the caller can not only use the existing
> > > functions but also newly added functions such as
> > > binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about
> > > something like below?
> > >
> >
> > You said: "can not only use the existing functions but also..."
> >
> > Hmm. Is that right? IIUC those existing "remove" functions should NOT
> > be called directly if the heap was "indexed" because they'll delete
> > the node from the heap OK, but any corresponding index for that
> > deleted node will be left lying around -- i.e. everything gets out of
> > sync. This was the reason for my original concern.
> >
>
> All existing binaryheap functions should be available even if the
> binaryheap is 'indexed'. For instance, with the patch,
> binaryheap_remote_node() is:
>
> void
> binaryheap_remove_node(binaryheap *heap, int n)
> {
> int cmp;
>
> Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> Assert(n >= 0 && n < heap->bh_size);
>
> /* compare last node to the one that is being removed */
> cmp = heap->bh_compare(heap->bh_nodes[--heap->bh_size],
> heap->bh_nodes[n],
> heap->bh_arg);
>
> /* remove the last node, placing it in the vacated entry */
> replace_node(heap, n, heap->bh_nodes[heap->bh_size]);
>
> /* sift as needed to preserve the heap property */
> if (cmp > 0)
> sift_up(heap, n);
> else if (cmp < 0)
> sift_down(heap, n);
> }
>
> The replace_node(), sift_up() and sift_down() update node's index as
> well if the binaryheap is indexed. When deleting the node from the
> binaryheap, it will also delete its index from the hash table.
>

I see now. Thanks for the information.

~~~

Some more review comments for v8-0002

======

1.
+/*
+ * Remove the node's index from the hash table if the heap is indexed.
+ */
+static bool
+delete_nodeidx(binaryheap *heap, bh_node_type node)
+{
+ if (!binaryheap_indexed(heap))
+ return false;
+
+ return bh_nodeidx_delete(heap->bh_nodeidx, node);
+}

I wasn't sure if having this function was a good idea. Yes, it makes
code more readable, but I felt the heap code ought to be as efficient
as possible so maybe it is better for the index check to be done at
the caller, instead of incurring any overhead of function calls that
might do nothing.

SUGGESTION
if (binaryheap_indexed(heap))
found = bh_nodeidx_delete(heap->bh_nodeidx, node);

~~~

2.
+/*
+ * binaryheap_update_up
+ *
+ * Sift the given node up after the node's key is updated. The caller must
+ * ensure that the given node is in the heap. O(log n) worst case.
+ *
+ * This function can be used only if the heap is indexed.
+ */
+void
+binaryheap_update_up(binaryheap *heap, bh_node_type d)
+{
+ bh_nodeidx_entry *ent;
+
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+ Assert(binaryheap_indexed(heap));
+
+ ent = bh_nodeidx_lookup(heap->bh_nodeidx, d);
+ Assert(ent);
+ Assert(ent->index >= 0 && ent->index < heap->bh_size);
+
+ sift_up(heap, ent->index);
+}
+
+/*
+ * binaryheap_update_down
+ *
+ * Sift the given node down after the node's key is updated. The caller must
+ * ensure that the given node is in the heap. O(log n) worst case.
+ *
+ * This function can be used only if the heap is indexed.
+ */
+void
+binaryheap_update_down(binaryheap *heap, bh_node_type d)
+{
+ bh_nodeidx_entry *ent;
+
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+ Assert(binaryheap_indexed(heap));
+
+ ent = bh_nodeidx_lookup(heap->bh_nodeidx, d);
+ Assert(ent);
+ Assert(ent->index >= 0 && ent->index < heap->bh_size);
+
+ sift_down(heap, ent->index);
+}

Since those functions are almost identical, wouldn't it be better to
combine them, passing the sift direction?

SUGGESTION
binaryheap_resift(binaryheap *heap, bh_node_type d, bool sift_dir_up)
{
...

if (sift_dir_up)
sift_up(heap, ent->index);
else
sift_down(heap, ent->index);
}

----------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-03-13 02:38:42 Re: Add new error_action COPY ON_ERROR "log"
Previous Message Thomas Munro 2024-03-13 02:03:01 Re: CI speed improvements for FreeBSD