Re: Improve eviction algorithm in ReorderBuffer

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve eviction algorithm in ReorderBuffer
Date: 2024-03-12 05:23:08
Message-ID: CAD21AoDBr-KB4yp02JWVoFFuVSA0R61890sHXcMsrBvN0rK6Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 8, 2024 at 12:58 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Thu, Mar 7, 2024 at 2:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 5, 2024 at 3:28 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
>
> > > 4a.
> > > The comment in simplehash.h says
> > > * The following parameters are only relevant when SH_DEFINE is defined:
> > > * - SH_KEY - ...
> > > * - SH_EQUAL(table, a, b) - ...
> > > * - SH_HASH_KEY(table, key) - ...
> > > * - SH_STORE_HASH - ...
> > > * - SH_GET_HASH(tb, a) - ...
> > >
> > > So maybe it is nicer to reorder the #defines in that same order?
> > >
> > > SUGGESTION:
> > > +#define SH_PREFIX bh_nodeidx
> > > +#define SH_ELEMENT_TYPE bh_nodeidx_entry
> > > +#define SH_KEY_TYPE bh_node_type
> > > +#define SH_SCOPE extern
> > > +#ifdef FRONTEND
> > > +#define SH_RAW_ALLOCATOR pg_malloc0
> > > +#endif
> > >
> > > +#define SH_DEFINE
> > > +#define SH_KEY key
> > > +#define SH_EQUAL(tb, a, b) (memcmp(&a, &b, sizeof(bh_node_type)) == 0)
> > > +#define SH_HASH_KEY(tb, key) \
> > > + hash_bytes((const unsigned char *) &key, sizeof(bh_node_type))
> > > +#include "lib/simplehash.h"
> >
> > I'm really not sure it helps increase readability. For instance, for
> > me it's readable if SH_DEFINE and SH_DECLARE come to the last before
> > #include since it's more obvious whether we want to declare, define or
> > both. Other simplehash.h users also do so.
> >
>
> OK.
>
> > > 5.
> > > + *
> > > + * If 'indexed' is true, we create a hash table to track of each node's
> > > + * index in the heap, enabling to perform some operations such as removing
> > > + * the node from the heap.
> > > */
> > > binaryheap *
> > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
> > > +binaryheap_allocate(int capacity, binaryheap_comparator compare,
> > > + bool indexed, void *arg)
> > >
> > > BEFORE
> > > ... enabling to perform some operations such as removing the node from the heap.
> > >
> > > SUGGESTION
> > > ... to help make operations such as removing nodes more efficient.
> > >
> >
> > But these operations literally require the indexed binary heap as we
> > have an assertion:
> >
> > void
> > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d)
> > {
> > bh_nodeidx_entry *ent;
> >
> > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> > Assert(heap->bh_indexed);
> >
>
> I didn’t quite understand -- the operations mentioned are "operations
> such as removing the node", but binaryheap_remove_node() also removes
> a node from the heap. So I still felt the comment wording of the patch
> is not quite correct.

Now I understand your point. That's a valid point.

>
> Now, if the removal of a node from an indexed heap can *only* be done
> using binaryheap_remove_node_ptr() then:
> - the other removal functions (binaryheap_remove_*) probably need some
> comments to make sure nobody is tempted to call them directly for an
> indexed heap.
> - maybe some refactoring and assertions are needed to ensure those
> *cannot* be called directly for an indexed heap.
>

If the 'index' is true, the caller can not only use the existing
functions but also newly added functions such as
binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about
something like below?

* If 'indexed' is true, we create a hash table to track each node's
* index in the heap, enabling to perform some operations such as
* binaryheap_remove_node_ptr() etc.

>
> And, here are some review comments for v8-0002.
>
> ======
> 1. delete_nodeidx
>
> +/*
> + * Remove the node's index from the hash table if the heap is indexed.
> + */
> +static bool
> +delete_nodeidx(binaryheap *heap, bh_node_type node)
> +{
> + if (!binaryheap_indexed(heap))
> + return false;
> +
> + return bh_nodeidx_delete(heap->bh_nodeidx, node);
> +}
>
> 1a.
> In v8 this function was changed to now return bool, so, I think the
> function comment should explain the meaning of that return value.
>
> ~
>
> 1b.
> I felt the function body is better expressed positively: "If this then
> do that", instead of "If not this then do nothing otherwise do that"
>
> SUGGESTION
> if (binaryheap_indexed(heap))
> return bh_nodeidx_delete(heap->bh_nodeidx, node);
>
> return false;
>
> ~~~
>
> 2.
> +static void
> +replace_node(binaryheap *heap, int index, bh_node_type new_node)
> +{
> + bool found PG_USED_FOR_ASSERTS_ONLY;
> +
> + /* Quick return if not necessary to move */
> + if (heap->bh_nodes[index] == new_node)
> + return;
> +
> + /*
> + * Remove overwritten node's index. The overwritten node's position must
> + * have been tracked, if enabled.
> + */
> + found = delete_nodeidx(heap, heap->bh_nodes[index]);
> + Assert(!binaryheap_indexed(heap) || found);
> +
> + /*
> + * Replace it with the given new node. This node's position must also be
> + * tracked as we assume to replace the node by the existing node.
> + */
> + found = set_node(heap, new_node, index);
> + Assert(!binaryheap_indexed(heap) || found);
> +}
>
> 2a.
> /Remove overwritten/Remove the overwritten/
> /replace the node by the existing node/replace the node with the existing node/
>
> ~
>
> 2b.
> It might be helpful to declare another local var...
> bh_node_type cur_node = heap->bh_nodes[index];
>
> ... because I think it will be more readable to say:
> + if (cur_node == new_node)
> + return;
>
> and
>
> + found = delete_nodeidx(heap, cur_node);

As for changes around delete_nodeidx(), I've changed the
delete_nodeidx() to return nothing as it would not be helpful much and
seems confusing. I've simplified replace_node() logic accordingly.

I'll update 0003 patch to address your comment and submit the updated
version patches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-12 05:28:01 Re: Regardign RecentFlushPtr in WalSndWaitForWal()
Previous Message Oleg Tselebrovskiy 2024-03-12 04:45:03 Re: [PROPOSAL] Skip test citext_utf8 on Windows