Re: Improve eviction algorithm in ReorderBuffer

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve eviction algorithm in ReorderBuffer
Date: 2024-04-08 12:29:59
Message-ID: CAD21AoAtf12e9Z9NLBuaO1GjHMMo16_8R-yBu9Q9jrk2QLqMEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 6, 2024 at 5:44 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> >
> > It sounds like a data structure that mixes the hash table and the
> > binary heap and we use it as the main storage (e.g. for
> > ReorderBufferTXN) instead of using the binary heap as the secondary
> > data structure. IIUC with your idea, the indexed binary heap has a
> > hash table to store elements each of which has its index within the
> > heap node array. I guess it's better to create it as a new data
> > structure rather than extending the existing binaryheap, since APIs
> > could be very different. I might be missing something, though.
>
> You are right that this approach starts to feel like a new data
> structure and is not v17 material.
>
> I am interested in this for v18 though -- we could make the API more
> like simplehash to be more flexible when using values (rather than
> pointers) and to be able to inline the comparator.

Interesting project. It would be great if we could support increasing
and decreasing the key as APIs. The current
binaryheap_update_{up|down} APIs are not very user-friendly.

>
> > > * remove the hash table from binaryheap.c
> > >
> > > * supply a new callback to the binary heap with type like:
> > >
> > > typedef void (*binaryheap_update_index)(
> > > bh_node_type node,
> > > int new_element_index);
> > >
> > > * make the remove, update_up, and update_down methods take the
> > > element
> > > index rather than the pointer
>
> ...
>
> > This shows that the current indexed binaryheap is much slower than
> > the
> > other two implementations, and the xx_binaryheap has a good number in
> > spite of also being indexed.
>
> xx_binaryheap isn't indexed though, right?

Well, yes. To be xact, xx_binaryheap isn't indexed but the element
indexes are stored in the element itself (see test_elem struct) so the
caller still can update the key using xx_binaryheap_update_{up|down}.

>
> > When it comes to
> > implementing the above idea (i.e. changing binaryheap to
> > xx_binaryheap), it was simple since we just replace the code where we
> > update the hash table with the code where we call the callback, if we
> > get the consensus on API change.
>
> That seems reasonable to me.
>
> > The fact that we use simplehash for the internal hash table might
> > make
> > this idea complex. If I understand your suggestion correctly, the
> > caller needs to tell the hash table the hash function when creating a
> > binaryheap but the hash function needs to be specified at a compile
> > time. We can use a dynahash instead but it would make the binaryheap
> > slow further.
>
> simplehash.h supports private_data, which makes it easier to track a
> callback.
>
> In binaryheap.c, that would look something like:
>
> static inline uint32
> binaryheap_hash(bh_nodeidx_hash *tab, uint32 key)
> {
> binaryheap_hashfunc hashfunc = tab->private_data;
> return hashfunc(key);
> }
>
> ...
> #define SH_HASH_KEY(tb, key) binaryheap_hash(tb, key)
> ...
>
> binaryheap_allocate(int num_nodes, binaryheap_comparator compare,
> void *arg, binaryheap_hashfunc hashfunc)
> {
> ...
> if (hashfunc != NULL)
> {
> /* could have a new structure, but we only need to
> * store one pointer, so don't bother with palloc/pfree */
> void *private_data = (void *)hashfunc;
> heap->bh_nodeidx = bh_nodeidx_create(..., private_data);
> ...
>
>
> And in reorderbuffer.c, define the callback like:
>
> static uint32
> reorderbuffer_xid_hash(TransactionId xid)
> {
> /* fasthash32 is 'static inline' so may
> * be faster than hash_bytes()? */
> return fasthash32(&xid, sizeof(TransactionId), 0);
> }
>

Thanks, that's a good idea.

>
>
> In summary, there are two viable approaches for addressing the concerns
> in v17:
>
> 1. Provide callback to update ReorderBufferTXN->heap_element_index, and
> use that index (rather than the pointer) for updating the heap key
> (transaction size) or removing elements from the heap.
>
> 2. Provide callback for hashing, so that binaryheap.c can hash the xid
> value rather than the pointer.
>
> I don't have a strong opinion about which one to use. I prefer
> something closer to #1 for v18, but for v17 I suggest whichever one
> comes out simpler.

I've implemented prototypes of both ideas, and attached the draft patches.

I agree with you that something closer to #1 is for v18. Probably we
can implement the #1 idea while making binaryheap codes template like
simplehash.h. For v17, changes for #2 are smaller, but I'm concerned
that the new API that requires a hash function to be able to use
binaryheap_update_{up|down} might not be user friendly. In terms of
APIs, I prefer #1 idea. And changes for #1 can make the binaryheap
code simple, although it requires adding a variable in
ReorderBufferTXN instead. But overall, it can remove the hash table
and some functions so it looks better to me.

When it comes to performance overhead, I mentioned that there is some
regression in the current binaryheap even without indexing. Since
function calling contributed to the regression, inlining some
functions reduced some overheads. For example, inlining set_node() and
replace_node(), the same benchmark test I used showed:

postgres(1:88476)=# select * from generate_series(1,3) x(x), lateral
(select * from bench_load(false, 10000000 * (1+x-x)));
x | cnt | load_ms | xx_load_ms | old_load_ms
---+----------+---------+------------+-------------
1 | 10000000 | 502 | 624 | 427
2 | 10000000 | 503 | 622 | 428
3 | 10000000 | 502 | 621 | 427
(3 rows)

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
01_use_update_index_func_in_binaryheap.patch application/octet-stream 16.4 KB
02_use_hashfunc_in_binaryheap.patch application/octet-stream 8.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2024-04-08 12:42:01 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Jelte Fennema-Nio 2024-04-08 12:27:44 Re: Flushing large data immediately in pqcomm