RE: Improve eviction algorithm in ReorderBuffer

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Improve eviction algorithm in ReorderBuffer
Date: 2024-02-06 05:45:07
Message-ID: TYCPR01MB120772441516B0F7F43C90A2AF5462@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Sawada-san,

> Thank you for the review comments!
>
> >
> > A comment for 0001:
> >
> > 01.
> > ```
> > +static void
> > +bh_enlarge_node_array(binaryheap *heap)
> > +{
> > + if (heap->bh_size < heap->bh_space)
> > + return;
> > +
> > + heap->bh_space *= 2;
> > + heap->bh_nodes = repalloc(heap->bh_nodes,
> > + sizeof(bh_node_type) * heap->bh_space);
> > +}
> > ```
> >
> > I'm not sure it is OK to use repalloc() for enlarging bh_nodes. This data
> > structure public one and arbitrary codes and extensions can directly refer
> > bh_nodes. But if the array is repalloc()'d, the pointer would be updated so that
> > their referring would be a dangling pointer.
>
> Hmm I'm not sure this is the case that we need to really worry about,
> and cannot come up with a good use case where extensions refer to
> bh_nodes directly rather than binaryheap. In PostgreSQL codes, many
> Nodes already have pointers and are exposed.

Actually, me neither. I could not come up with the use-case - I just said the possibility.
If it is not a real issue, we can ignore.

>
> >
> > 04.
> > ```
> > extern binaryheap *binaryheap_allocate(int capacity,
> > binaryheap_comparator compare,
> > - void *arg);
> > + bool indexed, void *arg);
> > ```
> >
> > I felt pre-existing API should not be changed. How about adding
> > binaryheap_allocate_extended() or something which can specify the `bool
> indexed`?
> > binaryheap_allocate() sets heap->bh_indexed to false.
>
> I'm really not sure it's worth inventing a
> binaryheap_allocate_extended() function just for preserving API
> compatibility. I think it's generally a good idea to have
> xxx_extended() function to increase readability and usability, for
> example, for the case where the same (kind of default) arguments are
> passed in most cases and the function is called from many places.
> However, we have a handful binaryheap_allocate() callers, and I
> believe that it would not hurt the existing callers.

I kept (external) extensions which uses binaryheap APIs in my mind.
I thought we could avoid to raise costs for updating their codes. But I could
understand the change is small, so ... up to you.

> > 05.
> > ```
> > +extern void binaryheap_update_up(binaryheap *heap, bh_node_type d);
> > +extern void binaryheap_update_down(binaryheap *heap, bh_node_type d);
> > ```
> >
> > IIUC, callers must consider whether the node should be shift up/down and use
> > appropriate function, right? I felt it may not be user-friendly.
>
> Right, I couldn't come up with a better interface.
>
> Another idea I've considered was that the caller provides a callback
> function where it can compare the old and new keys. For example, in
> reorderbuffer case, we call like:
>
> binaryheap_update(rb->txn_heap, PointerGetDatum(txn),
> ReorderBufferTXNUpdateCompare, (void *) &old_size);
>
> Then in ReorderBufferTXNUpdateCompare(),
> ReorderBufferTXN *txn = (ReorderBufferTXN *) a;Size old_size = *(Size *) b;
> (compare txn->size to "b" ...)
>
> However it seems complicated...
>

I considered similar approach: accept old node as an argument of a compare function.
But it requires further memory allocation. Do someone have better idea?

>
> >
> > 08.
> > IIUC, if more than 1024 transactions are running but they have small amount of
> > changes, the performance may be degraded, right? Do you have a result in
> sucha
> > a case?
>
> I've run a benchmark test that I shared before[1]. Here are results of
> decoding a transaction that has 1M subtransaction each of which has 1
> INSERT:
>
> HEAD:
> 1810.192 ms
>
> HEAD w/ patch:
> 2001.094 ms
>
> I set a large enough value to logical_decoding_work_mem not to evict
> any transactions. I can see about about 10% performance regression in
> this case.

Thanks for running. I think this workload is the worst and an extreme case which
would not be occurred on the real system (Such a system should be fixed), so we
can say that the regression is up to -10%. I felt it could be negligible but how
do other think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-02-06 05:46:42 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Peter Smith 2024-02-06 05:15:49 Re: Synchronizing slots from primary to standby